Flink 中的类加载KafkaSerializerWrapper

2022-12-13 Thread wangshuai
在使用kafka自定义序列化时会导致对应的class无法加载的问题。通过分析,现有代码在使用AppClassLoader类加载器先加载了KafkaSerializerWrapper时,用户提交任务自己编写的类是无法通过AppClassLoader加载的,通过FlinkUserCodeClassLoader加载的话需要KafkaSerializerWrapper也是通过FlinkUserCodeClassLoader加载。
public void open(InitializationContext context) throws Exception {
final ClassLoader userCodeClassLoader = 
context.getUserCodeClassLoader().asClassLoader();
try (TemporaryClassLoaderContext ignored =
TemporaryClassLoaderContext.of(userCodeClassLoader)) {
serializer =
InstantiationUtil.instantiate(
serializerClass.getName(),
Serializer.class,
getClass().getClassLoader()); // ?? 似乎应该如此 
Thread.currentThread().getContextClassLoader()

if (serializer instanceof Configurable) {
((Configurable) serializer).configure(config);
} else {
serializer.configure(config, isKey);
}
} catch (Exception e) {
throw new IOException("Failed to instantiate the serializer of class " + 
serializer, e);
}
}

Re: Flink 中的类加载KafkaSerializerWrapper

2022-12-13 Thread Martijn Visser
Hi,

Please post to the Dev mailing list in English.

Best regards,

Martijn

On Tue, Dec 13, 2022 at 9:03 AM wangshuai  wrote:

>
> 在使用kafka自定义序列化时会导致对应的class无法加载的问题。通过分析,现有代码在使用AppClassLoader类加载器先加载了KafkaSerializerWrapper时,用户提交任务自己编写的类是无法通过AppClassLoader加载的,通过FlinkUserCodeClassLoader加载的话需要KafkaSerializerWrapper也是通过FlinkUserCodeClassLoader加载。
> public void open(InitializationContext context) throws Exception {
> final ClassLoader userCodeClassLoader =
> context.getUserCodeClassLoader().asClassLoader();
> try (TemporaryClassLoaderContext ignored =
> TemporaryClassLoaderContext.of(userCodeClassLoader)) {
> serializer =
> InstantiationUtil.instantiate(
> serializerClass.getName(),
> Serializer.class,
> getClass().getClassLoader()); // ?? 似乎应该如此
> Thread.currentThread().getContextClassLoader()
>
> if (serializer instanceof Configurable) {
> ((Configurable) serializer).configure(config);
> } else {
> serializer.configure(config, isKey);
> }
> } catch (Exception e) {
> throw new IOException("Failed to instantiate the serializer of class " +
> serializer, e);
> }
> }


Re: [DISCUSS] FLIP-275: Support Remote SQL Client Based on SQL Gateway

2022-12-13 Thread yu zelin
Hi, Timo,

Thanks for your suggestion. Recently I have discussed with @Godfrey He, 
@Shengkai Fang 
and @Jark Wu about the `RowFormat` (Thanks for all your suggestions). We 
finally came to 
a consensus which is similar to your suggestion. The details are as follows:

1. Add a REST query parameter ‘RowFormat’ = JSON/PLAIN_TEXT to tell the REST 
Endpoint
how to deserialize the RowData int ResultSet.

JSON format means the RowData will be serialized to JSON format, which 
contains original 
LogicalType information, so it can be deserialized back to RowData.

PLAIN_TEXT format means the RowData will be serialized to SQL-compliant, 
plain strings. 
The SQL Client can print the strings directly.

The example URI for fetching results is:
> /v2/sessions/:session_handle/operations/:operation_handle/result/:token?rowFormat=PLAIN_TEXT

2. Introduce two response bodies for fetching results in two formats.

For more details, please take a look at the FLIP 
[https://cwiki.apache.org/confluence/x/T48ODg]. 
I have updated it with an example of query response bodies in two format in 
section:
Public Interface -> REST Endpoint Modification.

> 2022年12月12日 18:09,Timo Walther  写道:
> 
> Hi everyone,
> 
> sorry to jump into this discussion so late.
> 
> > So we decided to revert the RowFormat related changes and let the client to 
> > resolve the print format.
> 
> Could you elaborate a bit on this topic in the FLIP? I still believe that we 
> need 2 types of output formats.
> 
> Format A: for the SQL Client CLI and other interactive notebooks that just 
> uses SQL CAST(... AS STRING) semantics executed on the server side
> 
> Format B: for JDBC SDK or other machine-readable downstream libraries
> 
> Take a TIMESTAMP WITH LOCAL TIME ZONE as an example. The string 
> representation depends on a session configuration option. Clients might not 
> be aware of this session option, so the formatting must happen on the server 
> side.
> 
> However, when the downstream consumer is a library, maybe the library would 
> like to get the raw millis/nanos since epoch.
> 
> Also nested rows and collections might be better encoded with format B for 
> libraries but interactive sessions are happy if nested types are already 
> formatted server-side, so not every client needs custom code for the 
> formatting.
> 
> Regards,
> Timo
> 
> 
> 
> On 06.12.22 15:13, godfrey he wrote:
>> Hi, zeklin
>>> The CLI will use default print style for the non-query result.
>> Please make sure the print results of EXPLAIN/DESC/SHOW CREATE TABLE
>> commands are clear.
>>> We think it’s better to add the root cause to the ErrorResponseBody.
>> LGTM
>> Best,
>> Godfrey
>> yu zelin  于2022年12月6日周二 17:51写道:
>>> 
>>> Hi, Godfrey
>>> 
>>> Thanks for your feedback. Below is my thoughts about your questions.
>>> 
>>> 1. About RowFormat.
>>> I agree to your opinion. So we decided to revert the RowFormat related 
>>> changes
>>> and let the client to resolve the print format.
>>> 
>>> 2. About ContentType
>>> I agree that the definition of the ContentType is not clear. But how to 
>>> define the
>>> statement type is another big question. So, we decided to only tell the 
>>> query result
>>> and non-query result apart. The CLI will use default print style for the 
>>> non-query
>>> result.
>>> 
>>> 3. About ErrorHandling
>>> I think reuse the current ErrorResponseBody is good, but parse the root 
>>> cause
>>> from the exception stack strings is quite hacking. We think it’s better to 
>>> add the
>>> root cause to the ErrorResponseBody.
>>> 
>>> 4. About Runtime REST API Modifications
>>> I agree, too. This part is moved to the ‘Future Work’.
>>> 
>>> Best,
>>> Yu Zelin
>>> 
>>> 
 2022年12月5日 18:33,godfrey he  写道:
 
 Hi Zelin,
 
 Thanks for driving this discussion.
 
 I have a few comments,
 
> Add RowFormat to ResultSet to indicate the format of rows.
 We should not require SqlGateway server to meet the display
 requirements of a CliClient.
 Because different CliClients may have different display style. The
 server just need to response the data,
 and the CliClient prints the result as needed. So RowFormat is not needed.
 
> Add ContentType to ResultSet to indicate what kind of data the result 
> contains.
 from my first sight, the values of ContentType are intersected, such
 as: A select query will return QUERY_RESULT,
 but it also has JOB_ID. OTHER is too ambiguous, I don't know which
 kind of query will return OTHER.
 I recommend returning the concrete type for each statement, such as
 "CREATE TABLE" for "create table xx (...) with ()",
 "SELECT" for "select * from xxx". The statement type can be maintained
 in `Operation`s.
 
> Error Handling
 I think current design of error handling mechanism can meet the
 requirement of CliClient, we can get the root cause from
 the stack (see ErrorResponseBody#errors). If it becomes a common
 r

[jira] [Created] (FLINK-30396) sql hint 'LOOKUP' which is defined in outer query block may take effect in inner query block

2022-12-13 Thread Jianhui Dong (Jira)
Jianhui Dong created FLINK-30396:


 Summary: sql hint 'LOOKUP' which is defined in outer query block 
may take effect in inner query block
 Key: FLINK-30396
 URL: https://issues.apache.org/jira/browse/FLINK-30396
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Affects Versions: 1.16.0
Reporter: Jianhui Dong


As [flink 
doc|https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/hints/#query-hints]
 said:

> {{Query hints}} can be used to suggest the optimizer to affect query 
> execution plan within a specified query scope. Their effective scope is 
> current {{{}Query block{}}}([What are query blocks 
> ?|https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/hints/#what-are-query-blocks-])
>  which {{Query Hints}} are specified.

But the sql hint 'LOOKUP' can ,like the demo following:
{code:java}
-- DDL
CREATE TABLE left_table (
    lid INTEGER,
    lname VARCHAR,
    pts AS PROCTIME()
) WITH (
    'connector' = 'filesystem',
'format' = 'csv',
'path'='xxx'
) 

CREATE TABLE dim_table (
    id INTEGER,
    name VARCHAR,
    mentor VARCHAR,
    gender VARCHAR
) WITH (
    'connector' = 'jdbc',
'url' = 'xxx',
'table-name' = 'dim1',
'username' = 'xxx',
'password' = 'xxx',
'driver'= 'com.mysql.cj.jdbc.Driver' 
)

-- DML
SELECT /*+ LOOKUP('table'='outer') */
    ll.id AS lid,
    ll.name,
    r.mentor,
    r.gender
FROM (
    SELECT /*+ LOOKUP('table'='inner') */
    l.lid AS id,
    l.lname AS name,
    r.mentor,
    r.gender,
    l.pts
    FROM left_table AS l
JOIN dim_table FOR SYSTEM_TIME AS OF l.pts AS r
ON l.lname = r.name
) ll JOIN dim_table FOR SYSTEM_TIME AS OF ll.pts AS r ON ll.name=r.name{code}
The inner correlate will have two hints: \{[LOOKUP inheritPath:[0] 
options:{table=inner}], [LOOKUP inheritPath:[0, 0, 0] options:\{table=outer}]}, 
and IMO which maybe is a bug.

The reason for the above case is that the hint 'ALIAS' now only works for join 
rel nodes and 'LOOKUP' works for correlate and join rel nodes.

I think maybe the better way would be to make 'ALIAS' support both correlate 
and join rel nodes like 'LOOKUP'.
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] FLIP-275: Support Remote SQL Client Based on SQL Gateway

2022-12-13 Thread yu zelin
Hi, everyone,

Looks like our new design is similar to Timo’s suggestion, and considering that 
there has
no response from other devs for a long time, I want to start the vote on 
Thursday.  


Best,
Yu Zelin

> 2022年12月13日 16:23,yu zelin  写道:
> 
> Hi, Timo,
> 
> Thanks for your suggestion. Recently I have discussed with @Godfrey He, 
> @Shengkai Fang 
> and @Jark Wu about the `RowFormat` (Thanks for all your suggestions). We 
> finally came to 
> a consensus which is similar to your suggestion. The details are as follows:
> 
> 1. Add a REST query parameter ‘RowFormat’ = JSON/PLAIN_TEXT to tell the REST 
> Endpoint
> how to deserialize the RowData int ResultSet.
> 
> JSON format means the RowData will be serialized to JSON format, which 
> contains original 
> LogicalType information, so it can be deserialized back to RowData.
> 
> PLAIN_TEXT format means the RowData will be serialized to SQL-compliant, 
> plain strings. 
> The SQL Client can print the strings directly.
> 
> The example URI for fetching results is:
> > /v2/sessions/:session_handle/operations/:operation_handle/result/:token?rowFormat=PLAIN_TEXT
> 
> 2. Introduce two response bodies for fetching results in two formats.
> 
> For more details, please take a look at the FLIP 
> [https://cwiki.apache.org/confluence/x/T48ODg]. 
> I have updated it with an example of query response bodies in two format in 
> section:
> Public Interface -> REST Endpoint Modification.
> 
>> 2022年12月12日 18:09,Timo Walther  写道:
>> 
>> Hi everyone,
>> 
>> sorry to jump into this discussion so late.
>> 
>> > So we decided to revert the RowFormat related changes and let the client 
>> > to resolve the print format.
>> 
>> Could you elaborate a bit on this topic in the FLIP? I still believe that we 
>> need 2 types of output formats.
>> 
>> Format A: for the SQL Client CLI and other interactive notebooks that just 
>> uses SQL CAST(... AS STRING) semantics executed on the server side
>> 
>> Format B: for JDBC SDK or other machine-readable downstream libraries
>> 
>> Take a TIMESTAMP WITH LOCAL TIME ZONE as an example. The string 
>> representation depends on a session configuration option. Clients might not 
>> be aware of this session option, so the formatting must happen on the server 
>> side.
>> 
>> However, when the downstream consumer is a library, maybe the library would 
>> like to get the raw millis/nanos since epoch.
>> 
>> Also nested rows and collections might be better encoded with format B for 
>> libraries but interactive sessions are happy if nested types are already 
>> formatted server-side, so not every client needs custom code for the 
>> formatting.
>> 
>> Regards,
>> Timo
>> 
>> 
>> 
>> On 06.12.22 15:13, godfrey he wrote:
>>> Hi, zeklin
 The CLI will use default print style for the non-query result.
>>> Please make sure the print results of EXPLAIN/DESC/SHOW CREATE TABLE
>>> commands are clear.
 We think it’s better to add the root cause to the ErrorResponseBody.
>>> LGTM
>>> Best,
>>> Godfrey
>>> yu zelin  于2022年12月6日周二 17:51写道:
 
 Hi, Godfrey
 
 Thanks for your feedback. Below is my thoughts about your questions.
 
 1. About RowFormat.
 I agree to your opinion. So we decided to revert the RowFormat related 
 changes
 and let the client to resolve the print format.
 
 2. About ContentType
 I agree that the definition of the ContentType is not clear. But how to 
 define the
 statement type is another big question. So, we decided to only tell the 
 query result
 and non-query result apart. The CLI will use default print style for the 
 non-query
 result.
 
 3. About ErrorHandling
 I think reuse the current ErrorResponseBody is good, but parse the root 
 cause
 from the exception stack strings is quite hacking. We think it’s better to 
 add the
 root cause to the ErrorResponseBody.
 
 4. About Runtime REST API Modifications
 I agree, too. This part is moved to the ‘Future Work’.
 
 Best,
 Yu Zelin
 
 
> 2022年12月5日 18:33,godfrey he  写道:
> 
> Hi Zelin,
> 
> Thanks for driving this discussion.
> 
> I have a few comments,
> 
>> Add RowFormat to ResultSet to indicate the format of rows.
> We should not require SqlGateway server to meet the display
> requirements of a CliClient.
> Because different CliClients may have different display style. The
> server just need to response the data,
> and the CliClient prints the result as needed. So RowFormat is not needed.
> 
>> Add ContentType to ResultSet to indicate what kind of data the result 
>> contains.
> from my first sight, the values of ContentType are intersected, such
> as: A select query will return QUERY_RESULT,
> but it also has JOB_ID. OTHER is too ambiguous, I don't know which
> kind of query will return OTHER.
> I recommend returning the concrete type for each

Re: [DISCUSS] FLIP-275: Support Remote SQL Client Based on SQL Gateway

2022-12-13 Thread yu zelin
Hi everyone,

Sorry for the incorrect message in my last email. I want to start the vote
on Wednesday
as long as there are no questions in this period.

Best,
Yu Zelin

On Tue, Dec 13, 2022 at 5:08 PM yu zelin  wrote:

> Hi, everyone,
>
> Looks like our new design is similar to Timo’s suggestion, and considering
> that there has
> no response from other devs for a long time, I want to start the vote on
> Thursday.
>
>
> Best,
> Yu Zelin
>
> 2022年12月13日 16:23,yu zelin  写道:
>
> Hi, Timo,
>
> Thanks for your suggestion. Recently I have discussed with @Godfrey He,
> @Shengkai Fang
> and @Jark Wu about the `RowFormat` (Thanks for all your suggestions). We
> finally came to
> a consensus which is similar to your suggestion. The details are as
> follows:
>
> 1. Add a REST query parameter ‘*RowFormat*’ = JSON/PLAIN_TEXT to tell the
> REST Endpoint
> how to deserialize the RowData int ResultSet.
>
> JSON format means the RowData will be serialized to JSON format, which
> contains original
> *LogicalType* information, so it can be deserialized back to RowData.
>
> PLAIN_TEXT format means the RowData will be serialized to
> SQL-compliant, plain strings.
> The SQL Client can print the strings directly.
>
> The example URI for fetching results is:
> > /v2/sessions/:session_handle/operations/:operation_handle/result/:token?
> rowFormat=PLAIN_TEXT
>
> 2. Introduce two response bodies for fetching results in two formats.
>
> For more details, please take a look at the FLIP [
> https://cwiki.apache.org/confluence/x/T48ODg].
> I have updated it with an example of query response bodies in two format
> in section:
> Public Interface -> REST Endpoint Modification.
>
> 2022年12月12日 18:09,Timo Walther  写道:
>
> Hi everyone,
>
> sorry to jump into this discussion so late.
>
> > So we decided to revert the RowFormat related changes and let the client
> to resolve the print format.
>
> Could you elaborate a bit on this topic in the FLIP? I still believe that
> we need 2 types of output formats.
>
> Format A: for the SQL Client CLI and other interactive notebooks that just
> uses SQL CAST(... AS STRING) semantics executed on the server side
>
> Format B: for JDBC SDK or other machine-readable downstream libraries
>
> Take a TIMESTAMP WITH LOCAL TIME ZONE as an example. The string
> representation depends on a session configuration option. Clients might not
> be aware of this session option, so the formatting must happen on the
> server side.
>
> However, when the downstream consumer is a library, maybe the library
> would like to get the raw millis/nanos since epoch.
>
> Also nested rows and collections might be better encoded with format B for
> libraries but interactive sessions are happy if nested types are already
> formatted server-side, so not every client needs custom code for the
> formatting.
>
> Regards,
> Timo
>
>
>
> On 06.12.22 15:13, godfrey he wrote:
>
> Hi, zeklin
>
> The CLI will use default print style for the non-query result.
>
> Please make sure the print results of EXPLAIN/DESC/SHOW CREATE TABLE
> commands are clear.
>
> We think it’s better to add the root cause to the ErrorResponseBody.
>
> LGTM
> Best,
> Godfrey
> yu zelin  于2022年12月6日周二 17:51写道:
>
>
> Hi, Godfrey
>
> Thanks for your feedback. Below is my thoughts about your questions.
>
> 1. About RowFormat.
> I agree to your opinion. So we decided to revert the RowFormat related
> changes
> and let the client to resolve the print format.
>
> 2. About ContentType
> I agree that the definition of the ContentType is not clear. But how to
> define the
> statement type is another big question. So, we decided to only tell the
> query result
> and non-query result apart. The CLI will use default print style for the
> non-query
> result.
>
> 3. About ErrorHandling
> I think reuse the current ErrorResponseBody is good, but parse the root
> cause
> from the exception stack strings is quite hacking. We think it’s better to
> add the
> root cause to the ErrorResponseBody.
>
> 4. About Runtime REST API Modifications
> I agree, too. This part is moved to the ‘Future Work’.
>
> Best,
> Yu Zelin
>
>
> 2022年12月5日 18:33,godfrey he  写道:
>
> Hi Zelin,
>
> Thanks for driving this discussion.
>
> I have a few comments,
>
> Add RowFormat to ResultSet to indicate the format of rows.
>
> We should not require SqlGateway server to meet the display
> requirements of a CliClient.
> Because different CliClients may have different display style. The
> server just need to response the data,
> and the CliClient prints the result as needed. So RowFormat is not needed.
>
> Add ContentType to ResultSet to indicate what kind of data the result
> contains.
>
> from my first sight, the values of ContentType are intersected, such
> as: A select query will return QUERY_RESULT,
> but it also has JOB_ID. OTHER is too ambiguous, I don't know which
> kind of query will return OTHER.
> I recommend returning the concrete type for each statement, such as
> "CREATE TABLE" for "create table

[SUMMARY] Flink 1.17 Release Sync 12/13/2022

2022-12-13 Thread Leonard Xu
Hi devs and users,

I’d like to share the highlights about the 1.17 release sync on 12/13/2022.

- Release tracking page:
 -  1.17 development is moving forward [1], we have 5 weeks remaining
 - @committers Please continuously update the the progress in the 1.17 page

- Externalized Connectors :
  - flink-connector-aws v4.0.0 released
  - flink-connector-pulsar v3.0.0 started the VOTE
  - flink-connector-kafka PR is reviewing

- Blockers:
- Blockers FLINK-28766 and FLINK-29461 have been FIXED
- FLINK-29405 - InputFormatCacheLoaderTest is unstable OPEN 
  Qingsheng will have a look at the PR
- FLINK-26974 - Python EmbeddedThreadDependencyTests.test_add_python_file 
failed on azure OPEN 
  Xingbo is working on this
- FLINK-18356 - flink-table-planner Exit code 137 returned from process 
REOPENED 
  Leonard will ping Godfrey to take a look
- FLINK-27916 - HybridSourceReaderTest.testReader failed with 
AssertionError REOPENED
  Martijn will ping Thomas once more

- How to have monitoring and quality control for the externalized connectors? 
  Martijn will make a proposal and open a dev discussion.

- Will the feature freeze date be affected by COVID and Spring Festival 
holiday?   
  Leonard will discuss with Chinese devs firstly and then make a proposal if 
needed before the next release sync.

The next release sync will be on December 27th, 2022, feel free to join us  if 
you are interested!

Google Meet: https://meet.google.com/wcx-fjbt-hhz
Dial-in: https://tel.meet/wcx-fjbt-hhz?pin=1940846765126  

Best regards,
Martijn, Qingsheng, Matthias and Leonard 

[1] https://cwiki.apache.org/confluence/display/FLINK/1.17+Release

[jira] [Created] (FLINK-30397) Remove Pulsar connector from master branch

2022-12-13 Thread Martijn Visser (Jira)
Martijn Visser created FLINK-30397:
--

 Summary: Remove Pulsar connector from master branch
 Key: FLINK-30397
 URL: https://issues.apache.org/jira/browse/FLINK-30397
 Project: Flink
  Issue Type: Sub-task
  Components: Connectors / Pulsar
Reporter: Martijn Visser
Assignee: Martijn Visser






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-30398) Introduce S3 support for table store

2022-12-13 Thread Jingsong Lee (Jira)
Jingsong Lee created FLINK-30398:


 Summary: Introduce S3 support for table store
 Key: FLINK-30398
 URL: https://issues.apache.org/jira/browse/FLINK-30398
 Project: Flink
  Issue Type: Sub-task
  Components: Table Store
Reporter: Jingsong Lee
Assignee: Jingsong Lee
 Fix For: table-store-0.3.0


S3 contains a large number of dependencies, which can easily lead to class 
conflicts. We need a plugin mechanism to load the corresponding jars through 
the classloader.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] Release flink-connector-pulsar v3.0.0, release candidate #2

2022-12-13 Thread Chesnay Schepler

-1

The source release does not contain a LICENSE or NOTICE file.


On 12/12/2022 20:06, Martijn Visser wrote:

Hi everyone,
Please review and vote on the release candidate #2 for the
flink-connector-pulsar version v3.0.0, as follows:
[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)

Note: this is equivalent to the Pulsar connector that was released with
Flink 1.16. This is the externalized version.

The complete staging area is available for your review, which includes:
* JIRA release notes [1],
* the official Apache source release to be deployed to dist.apache.org [2],
which are signed with the key with fingerprint
A5F3BCE4CBE993573EC5966A65321B8382B219AF [3],
* all artifacts to be deployed to the Maven Central Repository [4],
* source code tag v3.0.0-rc2 [5],
* website pull request listing the new release [6].

The vote will be open for at least 72 hours. It is adopted by majority
approval, with at least 3 PMC affirmative votes.

Thanks,
Martijn

https://twitter.com/MartijnVisser82
https://github.com/MartijnVisser

[1]
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12352588
[2]
https://dist.apache.org/repos/dist/dev/flink/flink-connector-pulsar-3.0.0-rc2/
[3] https://dist.apache.org/repos/dist/release/flink/KEYS
[4] https://repository.apache.org/content/repositories/orgapacheflink-1560/
[5] https://github.com/apache/flink-connector-pulsar/releases/tag/v3.0.0-rc2
[6] https://github.com/apache/flink-web/pull/589





Re: [VOTE] Release flink-connector-pulsar v3.0.0, release candidate #2

2022-12-13 Thread Martijn Visser
Thanks Chesnay. Cancelling the RC and will create another one

On Tue, Dec 13, 2022 at 11:46 AM Chesnay Schepler 
wrote:

> -1
>
> The source release does not contain a LICENSE or NOTICE file.
>
>
> On 12/12/2022 20:06, Martijn Visser wrote:
> > Hi everyone,
> > Please review and vote on the release candidate #2 for the
> > flink-connector-pulsar version v3.0.0, as follows:
> > [ ] +1, Approve the release
> > [ ] -1, Do not approve the release (please provide specific comments)
> >
> > Note: this is equivalent to the Pulsar connector that was released with
> > Flink 1.16. This is the externalized version.
> >
> > The complete staging area is available for your review, which includes:
> > * JIRA release notes [1],
> > * the official Apache source release to be deployed to dist.apache.org
> [2],
> > which are signed with the key with fingerprint
> > A5F3BCE4CBE993573EC5966A65321B8382B219AF [3],
> > * all artifacts to be deployed to the Maven Central Repository [4],
> > * source code tag v3.0.0-rc2 [5],
> > * website pull request listing the new release [6].
> >
> > The vote will be open for at least 72 hours. It is adopted by majority
> > approval, with at least 3 PMC affirmative votes.
> >
> > Thanks,
> > Martijn
> >
> > https://twitter.com/MartijnVisser82
> > https://github.com/MartijnVisser
> >
> > [1]
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12352588
> > [2]
> >
> https://dist.apache.org/repos/dist/dev/flink/flink-connector-pulsar-3.0.0-rc2/
> > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> > [4]
> https://repository.apache.org/content/repositories/orgapacheflink-1560/
> > [5]
> https://github.com/apache/flink-connector-pulsar/releases/tag/v3.0.0-rc2
> > [6] https://github.com/apache/flink-web/pull/589
> >
>
>


[jira] [Created] (FLINK-30399) Enable connectors to use config docs generator

2022-12-13 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-30399:


 Summary: Enable connectors to use config docs generator
 Key: FLINK-30399
 URL: https://issues.apache.org/jira/browse/FLINK-30399
 Project: Flink
  Issue Type: Sub-task
  Components: Connectors / Common, Documentation
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.17.0


publish flink-docs and refactor the configuration as necessary to be usable 
from externalized connector repos.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-30400) Stop bundling connector-base in externalized connectors

2022-12-13 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-30400:


 Summary: Stop bundling connector-base in externalized connectors
 Key: FLINK-30400
 URL: https://issues.apache.org/jira/browse/FLINK-30400
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / Common
Reporter: Chesnay Schepler


Check that none of the externalized connectors bundle connector-base; if so 
remove the bundling and schedule a new minor release.

Bundling this module is highly problematic w.r.t. binary compatibility, since 
bundled classes may rely on internal APIs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[VOTE] Release flink-connector-rabbitmq, release candidate #1

2022-12-13 Thread Martijn Visser
Hi everyone,
Please review and vote on the release candidate #1 for the version 3.0.0,
as follows:
[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)

The complete staging area is available for your review, which includes:
* JIRA release notes [1],
* the official Apache source release to be deployed to dist.apache.org [2],
which are signed with the key with fingerprint
A5F3BCE4CBE993573EC5966A65321B8382B219AF [3],
* all artifacts to be deployed to the Maven Central Repository [4],
* source code tag v3.0.0-rc1 [5],
* website pull request listing the new release [6].

The vote will be open for at least 72 hours. It is adopted by majority
approval, with at least 3 PMC affirmative votes.

Thanks,
Release Manager

[1]
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12352349
[2]
https://dist.apache.org/repos/dist/dev/flink/flink-connector-rabbitmq-3.0.0-rc1
[3] https://dist.apache.org/repos/dist/release/flink/KEYS
[4] https://repository.apache.org/content/repositories/orgapacheflink-1563/
[5]
https://github.com/apache/flink-connector-rabbitmq/releases/tag/v3.0.0-rc1
[6] https://github.com/apache/flink-web/pull/594


[VOTE] Release flink-connector-pulsar, release candidate #3

2022-12-13 Thread Martijn Visser
Hi everyone,
Please review and vote on the release candidate #3 for the version 3.0.0,
as follows:
[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)


The complete staging area is available for your review, which includes:
* JIRA release notes [1],
* the official Apache source release to be deployed to dist.apache.org [2],
which are signed with the key with fingerprint
A5F3BCE4CBE993573EC5966A65321B8382B219AF [3],
* all artifacts to be deployed to the Maven Central Repository [4],
* source code tag v3.0.0-rc3 [5],
* website pull request listing the new release [6].

The vote will be open for at least 72 hours. It is adopted by majority
approval, with at least 3 PMC affirmative votes.

Thanks,
Release Manager

[1]
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12352588
[2]
https://dist.apache.org/repos/dist/dev/flink/flink-connector-pulsar-3.0.0-rc3
[3] https://dist.apache.org/repos/dist/release/flink/KEYS
[4] https://repository.apache.org/content/repositories/orgapacheflink-1564/
[5] https://github.com/apache/flink-connector-pulsar/releases/tag/v3.0.0-rc3
[6] https://github.com/apache/flink-web/pull/589


Re: [VOTE] Apache Flink Kubernetes Operator Release 1.3.0, release candidate #1

2022-12-13 Thread Gyula Fóra
+1 (binding)

- Verified source, helm release and signatures
- Docs, javadocs look good
- Helm chart points to correct image, installed locally from helm repo and
run some apps
- Maven repo contents look clean and correct
- Checked release notes on JIRA

Looks great Matyas, awesome work!

Cheers
Gyula


On Mon, Dec 12, 2022 at 5:04 PM Maximilian Michels  wrote:

> +1 (binding)
>
> Source release looks good.
>
> 1. Downloaded, compiled, and verified the signature of the source release
> staged at
>
> https://dist.apache.org/repos/dist/dev/flink/flink-kubernetes-operator-1.3.0-rc1/
> <
> https://dist.apache.org/repos/dist/dev/flink/flink-kubernetes-operator-1.2.0-rc2/
> >
> 2. Verified licenses (Not a blocker: the LICENSE file does not contain a
> link to the bundled licenses directory, this should be fixed in future
> releases)
> 3. Verified Helm Chart
> 4. Deployed on K8s and ran Flink jobs
>
> -Max
>
> On Sat, Dec 10, 2022 at 12:17 PM Péter Váry 
> wrote:
>
> > +1 (non-binding)
> > 1. Checked the artifacts (binaries, headers, versions)
> > 2. Checked the signatures
> > 3. Compiled code
> > 4. Created docker images
> > 5. Run some manual tests
> > 6. Run the examples
> >
> > Other than one small issue (which is a local one) everything works like
> > charm.
> >
> > Thanks Matyas for taking care of this.
> >
> > Peter
> >
> > Hao t Chang  ezt írta (időpont: 2022. dec. 9., P,
> > 0:25):
> >
> > > Thank Matyas
> > >
> > > +1(non-binding)
> > > 1. Ran bundle generator with the
> > > ghcr.io/apache/flink-kubernetes-operator:b7e40da image
> > > 2. Deploy to K8s and OpenShift and create basic examples.
> > > 3. Ran bundle CI tests suits
> > >
> > > --
> > > Best,
> > > Ted Chang | Software Engineer | htch...@us.ibm.com
> > >
> > >
> >
>


Re: [DISCUSS] FLIP-276: Data Consistency of Streaming and Batch ETL in Flink and Table Store

2022-12-13 Thread Piotr Nowojski
Hi Shammon,

Thanks for the explanations, I think I understand the problem better now. I
have a couple of follow up questions, but first:

>> 3. I'm pretty sure there are counter examples, where your proposed
mechanism of using checkpoints (even aligned!) will produce
inconsistent data from the perspective of the event time.
>>  a) For example what if one of your "ETL" jobs, has the following DAG:
>>
>>  Even if you use aligned checkpoints for committing the data to the sink
table, the watermarks of "Window1" and "Window2" are completely
independent. The sink table might easily have data from the Src1/Window1
from the event time T1 and Src2/Window2 from later event time T2.
>>  b) I think the same applies if you have two completely independent ETL
jobs writing either to the same sink table, or two to different sink tables
(that are both later used in the same downstream job).
>
> Thank you for your feedback. I cannot see the DAG in 3.a in your reply,

I've attached the image directly. I hope you can see it now.

Basically what I meant is that if you have a topology like (from the
attached image):

window1 = src1.keyBy(...).window(...)
window2 = src2.keyBy(...).window(...)
window1.join(window2, ...).addSink(sink)

or with even simpler (note no keyBy between `src` and `process`):

src.process(some_function_that_buffers_data)..addSink(sink)

you will have the same problem. Generally speaking if there is an operator
buffering some data, and if the data are not flushed on every checkpoint
(any windowed or temporal operator, AsyncWaitOperator, CEP, ...), you can
design a graph that will produce "inconsistent" data as part of a
checkpoint.

Apart from that a couple of other questions/issues.

> 1) Global Checkpoint Commit: a) "rolling fashion" or b) altogether

Do we need to support the "altogether" one? Rolling checkpoint, as it's
more independent, I could see it scale much better, and avoid a lot of
problems that I mentioned before.

> 1) Checkpoint VS Watermark
>
> 1. Stateful Computation is aligned according to Timestamp Barrier

Indeed the biggest obstacle I see here, is that we would indeed most likely
have:

> b) Similar to the window operator, align data in memory according to
Timestamp.

for every operator.

> 4. Failover supports Timestamp fine-grained data recovery
>
> As we mentioned in the FLIP, each ETL is a complex single node. A single
> ETL job failover should not cause the failure of the entire "ETL
Topology".

I don't understand this point. Regardless if we are using
rolling checkpoints, all at once checkpoints or watermarks, I see the same
problems with non determinism, if we want to preserve the requirement to
not fail over the whole topology at once.

Both Watermarks and "rolling checkpoint" I think have the same issue, that
either require deterministic logic, or global failover, or downstream jobs
can only work on the already committed by the upstream records. But working
with only "committed records" would either brake consistency between
different jobs, or would cause huge delay in checkpointing and e2e latency,
as:
1. upstream job has to produce some data, downstream can not process it,
downstream can not process this data yet
2. checkpoint 42 is triggered on the upstream job
3. checkpoint 42 is completed on the upstream job, data processed since
last checkpoint has been committed
4. upstream job can continue producing more data
5. only now downstream can start processing the data produced in 1., but it
can not read the not-yet-committed data from 4.
6. once downstream finishes processing data from 1., it can trigger
checkpoint 42

The "all at once checkpoint", I can see only working with global failover
of everything.

This is assuming exactly-once mode. at-least-once would be much easier.

Best,
Piotrek

wt., 13 gru 2022 o 08:57 Shammon FY  napisał(a):

> Hi David,
>
> Thanks for the comments from you and @Piotr. I'd like to explain the
> details about the FLIP first.
>
> 1) Global Checkpoint Commit: a) "rolling fashion" or b) altogether
>
> This mainly depends on the needs of users. Users can decide the data
> version of tables in their queries according to different requirements for
> data consistency and freshness. Since we manage multiple versions for each
> table, this will not bring too much complexity to the system. We only need
> to support different strategies when calculating table versions for query.
> So we give this decision to users, who can use "consistency.type" to set
> different consistency in "Catalog". We can continue to refine this later.
> For example, dynamic parameters support different consistency requirements
> for each query
>
> 2) MetaService module
>
> Many Flink streaming jobs use application mode, and they are independent of
> each other. So we currently assume that MetaService is an independent node.
> In the first phase, it will be started in standalone, and HA will be
> supported later. This node will reuse many Flink modules, including REST,
> Gateway

[jira] [Created] (FLINK-30401) Add Estimator and Transformer for MinHashLSH

2022-12-13 Thread Fan Hong (Jira)
Fan Hong created FLINK-30401:


 Summary: Add Estimator and Transformer for MinHashLSH
 Key: FLINK-30401
 URL: https://issues.apache.org/jira/browse/FLINK-30401
 Project: Flink
  Issue Type: New Feature
  Components: Library / Machine Learning
Reporter: Fan Hong


Add Estimator and Transformer for MinHashLSH.

Its function would be at least equivalent to Spark's 
org.apache.spark.ml.feature.MinHashLSH. The relevant PR should contain the 
following components:
 * Java implementation/test (Must include)
 * Python implementation/test (Optional)
 * Markdown document (Optional)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-30402) Separate token framework generic and hadoop specific parts

2022-12-13 Thread Gabor Somogyi (Jira)
Gabor Somogyi created FLINK-30402:
-

 Summary: Separate token framework generic and hadoop specific parts
 Key: FLINK-30402
 URL: https://issues.apache.org/jira/browse/FLINK-30402
 Project: Flink
  Issue Type: Sub-task
Reporter: Gabor Somogyi






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-30403) The reported latest completed checkpoint is discarded

2022-12-13 Thread Zdenek Tison (Jira)
Zdenek Tison created FLINK-30403:


 Summary: The reported latest completed checkpoint is discarded
 Key: FLINK-30403
 URL: https://issues.apache.org/jira/browse/FLINK-30403
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Checkpointing
Affects Versions: 1.16.0
Reporter: Zdenek Tison


There is a small window where the reported latest completed checkpoint can be 
marked as discarded while the new checkpoint wasn't reported yet. 

The reason is that the function _addCompletedCheckpointToStoreAndSubsumeOldest_ 
 is called before _reportCompletedCheckpoint_ in _CheckpointCoordinator._

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] FLIP-277: Native GlueCatalog Support in Flink

2022-12-13 Thread Danny Cranmer
Hello Samrat,

Sorry for the late response.

+1 for a native Glue Data Catalog integration. We have
internally developed a Glue Data Catalog catalog implementation that shims
hive. We have been meaning to contribute, but this solution can replace our
internal one.

+1 for putting this in the flink-connector-aws. With regards to
configuration, we have a flink-connector-aws-base [1] module where all the
common configurations should go. Anything common, such as authentication
providers, please use. Additionally any new configurations you need to add
please consider them going into aws-base if they might be reusable for
other AWS integrations.

> We will create an e2e integration test cases capturing all the
implementation in a mock environment.

It looks like localstack supports glue [2], we already use localstack for
integration tests so we can follow suite here.

Thanks,
Danny

[1]
https://github.com/apache/flink-connector-aws/tree/main/flink-connector-aws-base
[2] https://docs.localstack.cloud/user-guide/aws/glue/

On Mon, Dec 12, 2022 at 12:18 PM Samrat Deb  wrote:

> Hi Konstantin Knauf,
>
> Can you explain how users are expected to authenticate with AWS Glue? I
>> don't see any catalog options regardng authx. So I assume the credentials
>> are taken from the environment?
>
>
> We are planning to put GlueCatalog in flink-connector-aws[1].
> flink-connector-aws already provides base and already built AwsConfigs[2].
> These configs can be reused for the Catalog purpose also.
> I will update the FLIP-277[3] with the auth related configs in the
> Configuration Section.
>
> Users can pass these values as a part of config in catalog creation and if
> not provided it will try to fetch from the environment.
> This will allow users to create multiple catalog instances on the same
> session pointing to different accounts. ( I haven't tested multi
> account glue catalog instances during POC) .
>
> [1] https://github.com/apache/flink-connector-aws
> 
> [2]
> https://github.com/apache/flink-connector-aws/blob/main/flink-connector-aws-base/src/main/java/org/apache/flink/connector/aws/config/AWSConfigConstants.java
> [3]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-277%3A+Native+GlueCatalog+Support+in+Flink
>
> Bests,
> Samrat
>
> On Mon, Dec 12, 2022 at 5:32 PM Samrat Deb  wrote:
>
>> Hi Jark,
>> Apologies for late reply.
>> Thank you for your valuable input.
>>
>> Besides, I have a question about Glue Namespace. Could you share the
>>> documentation of the Glue
>>>  Namespaces? (Sorry, I didn't find it.) According to the "Flink Glue
>>> Metaspace Mapping" section,
>>> if there is a database "mydb" under namespace "ns1", is that mean the
>>> database name in Flink is "ns1.mydb"?
>>
>> There is no concept of namespace in glue data catalog.
>> There are 3 levels in glue data catalog
>> - catalog
>> - database
>> - table
>>
>> I have added the mapping in FLIP-277[1]. and updated it .
>> it is directly database name from flink to database name in glue
>> Please ignore the typo leftover in doc previously.
>>
>> Best,
>> Samrat
>>
>>
>> On Fri, Dec 9, 2022 at 8:38 PM Jark Wu  wrote:
>>
>>> Hi Samrat,
>>>
>>> Thanks a lot for driving the new catalog, and sorry for jumping into the
>>> discussion late.
>>>
>>> As Flink SQL is becoming the first-class citizen of the Flink API, we are
>>> planning to push Catalog
>>> to become the first-class citizen of the connector instead of Source &
>>> Sink. For Flink SQL users,
>>> using Catalog is as natural and user-friendly as working with databases,
>>> rather than having to define
>>> DDL and schemas over and over again. This is also how Trino/Presto does.
>>>
>>> Regarding the repo for the Glue catalog, I think we can add it to
>>> flink-connector-aws. We don't need
>>> separate repos for Catalogs because Catalog is a kind of connector
>>> (others
>>> are sources & sinks).
>>> For example, MySqlCatalog[1] and PostgresCatalog[2] are in
>>> flink-connector-jdbc, and HiveCatalog is
>>> in flink-connector-hive. This can reduce repository maintenance, and I
>>> think maybe some common
>>> AWS utils can be shared there.  cc @Danny Cranmer <
>>> dannycran...@apache.org>
>>> what do you think about this?
>>>
>>> Besides, I have a question about Glue Namespace. Could you share the
>>> documentation of the Glue
>>>  Namespaces? (Sorry, I didn't find it.) According to the "Flink Glue
>>> Metaspace Mapping" section,
>>> if there is a database "mydb" under namespace "ns1", is that mean the
>>> database name in Flink is "ns1.mydb"?
>>>
>>> Best,
>>> Jark
>>>
>>>
>>> [1]:
>>>
>>> https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-jdbc/src/main/java/org/apache/flink/connector/jdbc/catalog/MySqlCatalog.java
>>> [2]:
>>>
>>> https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-jdbc/sr

[jira] [Created] (FLINK-30404) Do not redeploy taskmanagers on standalone application scaling

2022-12-13 Thread Gyula Fora (Jira)
Gyula Fora created FLINK-30404:
--

 Summary: Do not redeploy taskmanagers on standalone application 
scaling 
 Key: FLINK-30404
 URL: https://issues.apache.org/jira/browse/FLINK-30404
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Reporter: Gyula Fora
Assignee: Gyula Fora


The standalone deployment mode allows in-place scale ups when reactive mode is 
configured but in other cases it deletes and redeploys both task and 
jobmanagers.

Without reactive mode, we could improve the current behaviour by not 
redeploying taskmanagers during scale changes. In theory only the jobmanager 
needs to be redeployed in these cases (as it cannot run multiple jobs in 
application mode).

This improvement would greately improve the startup times in busy clusters and 
make scaling more robust



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-30405) Add ResourceLifecycleStatus to CommonStatus and printer column

2022-12-13 Thread Gyula Fora (Jira)
Gyula Fora created FLINK-30405:
--

 Summary: Add ResourceLifecycleStatus to CommonStatus and printer 
column 
 Key: FLINK-30405
 URL: https://issues.apache.org/jira/browse/FLINK-30405
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Reporter: Gyula Fora
 Fix For: kubernetes-operator-1.4.0


The CommonStatus api already contains a getter for the ResourceLifecycleState 
of a Flink resource.

We should remove the JsonIgnore annotation to expose this in the status. 

We should also expose this as a printer column instread of the reconciliation 
status that is used currently.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-30406) Jobmanager Deployment error without HA metadata should not lead to unrecoverable error

2022-12-13 Thread Gyula Fora (Jira)
Gyula Fora created FLINK-30406:
--

 Summary: Jobmanager Deployment error without HA metadata should 
not lead to unrecoverable error
 Key: FLINK-30406
 URL: https://issues.apache.org/jira/browse/FLINK-30406
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Reporter: Gyula Fora
Assignee: Gyula Fora
 Fix For: kubernetes-operator-1.4.0


Currently we don't have a good way of asserting that the job never started 
after savepoint upgrade when the JM deployment fails (such as on an incorrect 
image).

This easily leads to scenarios which require manual recovery from the user.

We should try to avoid this with some mechanism to greately improve the 
robustness of savepoint ugrades.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-30407) Better encapsulate error handling logic in controllers

2022-12-13 Thread Gyula Fora (Jira)
Gyula Fora created FLINK-30407:
--

 Summary: Better encapsulate error handling logic in controllers
 Key: FLINK-30407
 URL: https://issues.apache.org/jira/browse/FLINK-30407
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Reporter: Gyula Fora
 Fix For: kubernetes-operator-1.4.0


The error handling in the FlinkDeployment and SessionJobControllers are a bit 
adhoc and mostly consist of a series of try catch blocks.

We should introduce a set of error handlers that we can encapsulate nicely and 
share between the controllers to reduce code duplication.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-30408) Add unit test for HA metadata check logic

2022-12-13 Thread Gyula Fora (Jira)
Gyula Fora created FLINK-30408:
--

 Summary: Add unit test for HA metadata check logic
 Key: FLINK-30408
 URL: https://issues.apache.org/jira/browse/FLINK-30408
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Reporter: Gyula Fora
 Fix For: kubernetes-operator-1.4.0


The current mechanism to check for the existence of HA metadata in the operator 
is not guarded by any unit tests which makes in more susceptible to accidental 
regressions.

We should add at least a few simple test cases to cover the expected behaviour



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[RESULT] [VOTE] Apache Flink Kubernetes Operator Release 1.3.0, release candidate #1

2022-12-13 Thread Őrhidi Mátyás
Hi Devs!

I'm happy to announce that we have unanimously approved this release.

There are 6 approving votes, 3 of which are binding:

   -

   Marton Balassi (binding)
   -

   Max Michels (binding)
   -

   Gyula Fora (binding)


   -

   Jim Busche (non-binding)
   -

   Ted Chang (non-binding)
   -

   Peter Vary (non-binding)

There are no disapproving votes.

Thanks everyone!

Matyas


[jira] [Created] (FLINK-30409) Support reopening closed metric groups

2022-12-13 Thread Mason Chen (Jira)
Mason Chen created FLINK-30409:
--

 Summary: Support reopening closed metric groups
 Key: FLINK-30409
 URL: https://issues.apache.org/jira/browse/FLINK-30409
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Metrics
Affects Versions: 1.17.0
Reporter: Mason Chen


Currently, metricGroup.close() will unregister metrics and the underlying 
metric groups. If the metricGroup is created again via addGroup(), it will 
silently fail to create metrics since the metric group is in a closed state.

We need to close metric groups and reopen them because some of the metrics may 
reference old objects that are no longer relevant/stale and we need to 
re-create the metric/metric group to point to the new references. For example, 
we may close `KafkaSourceReader` to remove a topic partition from assignment 
and then recreate `KafkaSourceReader` with a different set of topic partitions. 
The metrics should also reflect that.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)