[jira] [Commented] (FLINK-32677) flink-benchmarks-regression-check failed to send slack messages since 2023.07.17
[ https://issues.apache.org/jira/browse/FLINK-32677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17747310#comment-17747310 ] Yanfei Lei commented on FLINK-32677: Thanks for the response, I got the reason, the [token|https://apache-flink.slack.com/services/B05J88JAD6C] of Jenkins integration has been changed after resetting. I updated the token in [Jenkins credentials|http://codespeed.dak8s.net:8080/credentials/store/system/domain/_/credential/2d52c4a5-ab95-42f5-b9b7-eb1a1a95b232] , now it works again. :) > flink-benchmarks-regression-check failed to send slack messages since > 2023.07.17 > > > Key: FLINK-32677 > URL: https://issues.apache.org/jira/browse/FLINK-32677 > Project: Flink > Issue Type: Bug > Components: Benchmarks >Reporter: Yanfei Lei >Priority: Critical > > {code:java} > Response Code: 404 rel="alternate" type="text/html" > href="http://codespeed.dak8s.net:8080/log"/>2717802023-07-19T11:38:29Z2023-07-19T11:38:29ZJul > 19, 2023 11:38:29 AM jenkins.plugins.slack.StandardSlackService publish > WARNING: Response Code: 404 > Slack post may have failed. Response: > null href="http://codespeed.dak8s.net:8080/log"/>2717792023-07-19T11:38:29Z2023-07-19T11:38:29ZJul > 19, 2023 11:38:29 AM jenkins.plugins.slack.StandardSlackService publish > WARNING: Slack post may have failed. Response: null {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (FLINK-32677) flink-benchmarks-regression-check failed to send slack messages since 2023.07.17
[ https://issues.apache.org/jira/browse/FLINK-32677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanfei Lei resolved FLINK-32677. Resolution: Fixed > flink-benchmarks-regression-check failed to send slack messages since > 2023.07.17 > > > Key: FLINK-32677 > URL: https://issues.apache.org/jira/browse/FLINK-32677 > Project: Flink > Issue Type: Bug > Components: Benchmarks >Reporter: Yanfei Lei >Priority: Critical > > {code:java} > Response Code: 404 rel="alternate" type="text/html" > href="http://codespeed.dak8s.net:8080/log"/>2717802023-07-19T11:38:29Z2023-07-19T11:38:29ZJul > 19, 2023 11:38:29 AM jenkins.plugins.slack.StandardSlackService publish > WARNING: Response Code: 404 > Slack post may have failed. Response: > null href="http://codespeed.dak8s.net:8080/log"/>2717792023-07-19T11:38:29Z2023-07-19T11:38:29ZJul > 19, 2023 11:38:29 AM jenkins.plugins.slack.StandardSlackService publish > WARNING: Slack post may have failed. Response: null {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-32678) Test FLIP-285 LeaderElection
[ https://issues.apache.org/jira/browse/FLINK-32678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17747317#comment-17747317 ] Matthias Pohl commented on FLINK-32678: --- [~wangyang0918] I'm wondering whether we could use this test as an umbrella test for two other topics that ended up in the 1.18 release: * FLINK-32032 (flink-shaded update including netty) * FLINK-32468 (Migration from Akka to Pekko) They are quite fundamental changes and hard to test in a manual way. The proposal would be to do a stress test on Flink 1.18. For FLINK-32032 and FLINK-32468, observing that nothing breaks might be good enough. WDYT? > Test FLIP-285 LeaderElection > > > Key: FLINK-32678 > URL: https://issues.apache.org/jira/browse/FLINK-32678 > Project: Flink > Issue Type: Technical Debt > Components: Runtime / Coordination >Affects Versions: 1.18.0 >Reporter: Matthias Pohl >Assignee: Yang Wang >Priority: Blocker > Labels: release-testing > Fix For: 1.18.0 > > > We decided to do another round of testing for the LeaderElection refactoring > which happened in > [FLIP-285|https://cwiki.apache.org/confluence/display/FLINK/FLIP-285%3A+refactoring+leaderelection+to+make+flink+support+multi-component+leader+election+out-of-the-box]. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-32677) flink-benchmarks-regression-check failed to send slack messages since 2023.07.17
[ https://issues.apache.org/jira/browse/FLINK-32677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17747319#comment-17747319 ] Martijn Visser commented on FLINK-32677: That makes sense [~Yanfei Lei] - Thanks for fixing it! > flink-benchmarks-regression-check failed to send slack messages since > 2023.07.17 > > > Key: FLINK-32677 > URL: https://issues.apache.org/jira/browse/FLINK-32677 > Project: Flink > Issue Type: Bug > Components: Benchmarks >Reporter: Yanfei Lei >Priority: Critical > > {code:java} > Response Code: 404 rel="alternate" type="text/html" > href="http://codespeed.dak8s.net:8080/log"/>2717802023-07-19T11:38:29Z2023-07-19T11:38:29ZJul > 19, 2023 11:38:29 AM jenkins.plugins.slack.StandardSlackService publish > WARNING: Response Code: 404 > Slack post may have failed. Response: > null href="http://codespeed.dak8s.net:8080/log"/>2717792023-07-19T11:38:29Z2023-07-19T11:38:29ZJul > 19, 2023 11:38:29 AM jenkins.plugins.slack.StandardSlackService publish > WARNING: Slack post may have failed. Response: null {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (FLINK-32677) flink-benchmarks-regression-check failed to send slack messages since 2023.07.17
[ https://issues.apache.org/jira/browse/FLINK-32677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Martijn Visser reassigned FLINK-32677: -- Assignee: Yanfei Lei > flink-benchmarks-regression-check failed to send slack messages since > 2023.07.17 > > > Key: FLINK-32677 > URL: https://issues.apache.org/jira/browse/FLINK-32677 > Project: Flink > Issue Type: Bug > Components: Benchmarks >Reporter: Yanfei Lei >Assignee: Yanfei Lei >Priority: Critical > > {code:java} > Response Code: 404 rel="alternate" type="text/html" > href="http://codespeed.dak8s.net:8080/log"/>2717802023-07-19T11:38:29Z2023-07-19T11:38:29ZJul > 19, 2023 11:38:29 AM jenkins.plugins.slack.StandardSlackService publish > WARNING: Response Code: 404 > Slack post may have failed. Response: > null href="http://codespeed.dak8s.net:8080/log"/>2717792023-07-19T11:38:29Z2023-07-19T11:38:29ZJul > 19, 2023 11:38:29 AM jenkins.plugins.slack.StandardSlackService publish > WARNING: Slack post may have failed. Response: null {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-31634) FLIP-301: Hybrid Shuffle supports Remote Storage
[ https://issues.apache.org/jira/browse/FLINK-31634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17747330#comment-17747330 ] Konstantin Knauf commented on FLINK-31634: -- Thanks, [~tanyuxin]. > FLIP-301: Hybrid Shuffle supports Remote Storage > > > Key: FLINK-31634 > URL: https://issues.apache.org/jira/browse/FLINK-31634 > Project: Flink > Issue Type: New Feature > Components: Runtime / Network >Affects Versions: 1.18.0 >Reporter: Yuxin Tan >Assignee: Yuxin Tan >Priority: Major > Labels: Umbrella > > This is an umbrella ticket for > [FLIP-301|https://cwiki.apache.org/confluence/display/FLINK/FLIP-301%3A+Hybrid+Shuffle+supports+Remote+Storage]. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-32402) FLIP-294: Support Customized Catalog Modification Listener
[ https://issues.apache.org/jira/browse/FLINK-32402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17747331#comment-17747331 ] Konstantin Knauf commented on FLINK-32402: -- [~zjureel] Great. There is a PR that adds documentation for the pluggable error classification under Deployment -> Advanced (https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/advanced). Would this be a good place? > FLIP-294: Support Customized Catalog Modification Listener > -- > > Key: FLINK-32402 > URL: https://issues.apache.org/jira/browse/FLINK-32402 > Project: Flink > Issue Type: Improvement > Components: Table SQL / Ecosystem >Affects Versions: 1.18.0 >Reporter: Fang Yong >Assignee: Fang Yong >Priority: Major > > Issue for > https://cwiki.apache.org/confluence/display/FLINK/FLIP-294%3A+Support+Customized+Catalog+Modification+Listener -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-32468) Replace Akka by Pekko
[ https://issues.apache.org/jira/browse/FLINK-32468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl updated FLINK-32468: -- Release Note: Flink's RPC framework is now based on Apache Pekko instead of Akka. Any Akka dependencies were removed. > Replace Akka by Pekko > - > > Key: FLINK-32468 > URL: https://issues.apache.org/jira/browse/FLINK-32468 > Project: Flink > Issue Type: Improvement > Components: Runtime / Coordination >Reporter: Konstantin Knauf >Assignee: Chesnay Schepler >Priority: Critical > Labels: pull-request-available > Fix For: 1.18.0 > > > Akka 2.6.x will not receive security fixes from September 2023 onwards (see > https://discuss.lightbend.com/t/2-6-x-maintenance-proposal/9949). > A mid-term plan to replace Akka is described in FLINK-29281. In the meantime, > we suggest to replace Akka by Apache Pekko (incubating), which is a fork of > Akka 2.6.x under the Apache 2.0 license. This way - if needed - we at least > have the ability to release security fixes ourselves in collaboration with > the Pekko community. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-32666) ASM rewrite class lead to package failed.
[ https://issues.apache.org/jira/browse/FLINK-32666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17747333#comment-17747333 ] lizhiqiang commented on FLINK-32666: I found that this is a jdk bug, I think we should limit the small version of jdk, to prevent users from using, wasting a lot of time, this error is particularly difficult to troubleshoot. [https://bugs.openjdk.org/browse/JDK-8191969] fixed in 8u172 [~martijnvisser] > ASM rewrite class lead to package failed. > - > > Key: FLINK-32666 > URL: https://issues.apache.org/jira/browse/FLINK-32666 > Project: Flink > Issue Type: Bug > Components: Table SQL / Planner >Affects Versions: 1.18.0 >Reporter: lizhiqiang >Priority: Major > > {code:java} > [DEBUG] Processing JAR > /Users/lzq/Desktop/Projects/Flink/flink/flink-master/flink-table/flink-table-planner/target/flink-table-planner_2.12-1.17-SNAPSHOT.jar > [DEBUG] Rewrote class bytecode: > org/apache/calcite/interpreter/JaninoRexCompiler.class > [DEBUG] Rewrote class bytecode: > org/apache/calcite/tools/RelBuilder$Frame.class > [DEBUG] Rewrote class bytecode: > org/apache/calcite/tools/ImmutableRelBuilder$Config.class > [DEBUG] Rewrote class bytecode: org/apache/calcite/tools/RelBuilder.class > [DEBUG] Rewrote class bytecode: > org/apache/calcite/tools/RelBuilder$OverCallImpl.class > [DEBUG] Rewrote class bytecode: > org/apache/calcite/tools/RelBuilder$GroupKey.class > [DEBUG] Keeping original class bytecode: > org/apache/calcite/tools/RelBuilder$OverCall.class > [DEBUG] Rewrote class bytecode: > org/apache/calcite/tools/RelBuilder$RelOptTableFinder.class > [DEBUG] Rewrote class bytecode: > org/apache/calcite/tools/ImmutableRelBuilder$Config$Builder.class > [DEBUG] Rewrote class bytecode: > org/apache/calcite/tools/RelBuilder$Registrar.class > [DEBUG] Keeping original class bytecode: > org/apache/calcite/tools/RelBuilder$OverCallImpl$1.class > [DEBUG] Rewrote class bytecode: > org/apache/calcite/tools/RelBuilder$AggCallImpl.class > [DEBUG] Keeping original class bytecode: > org/apache/calcite/tools/ImmutableRelBuilder.class > [DEBUG] Rewrote class bytecode: > org/apache/calcite/tools/RelBuilder$AggCall.class > [DEBUG] Keeping original class bytecode: > org/apache/calcite/tools/ImmutableRelBuilder$1.class > [DEBUG] Keeping original class bytecode: > org/apache/calcite/tools/RelBuilder$1.class > [DEBUG] Rewrote class bytecode: > org/apache/calcite/tools/RelBuilder$GroupKeyImpl.class > [DEBUG] Keeping original class bytecode: > org/apache/calcite/tools/RelBuilder$Config.class > [DEBUG] Keeping original class bytecode: > org/apache/calcite/tools/ImmutableRelBuilder$Config$InitShim.class > [DEBUG] Rewrote class bytecode: > org/apache/calcite/tools/RelBuilder$AggCallPlus.class > [DEBUG] Rewrote class bytecode: > org/apache/calcite/tools/RelBuilder$AggCallImpl2.class > [DEBUG] Keeping original class bytecode: > org/apache/calcite/tools/RelBuilder$Shifter.class > [DEBUG] Rewrote class bytecode: > org/apache/calcite/tools/RelBuilder$Field.class > [DEBUG] Keeping original class bytecode: > org/apache/calcite/tools/RelBuilder$2.class > [DEBUG] Rewrote class bytecode: > org/apache/calcite/util/javac/JaninoCompiler$JaninoCompilerArgs.class > [DEBUG] Rewrote class bytecode: > org/apache/calcite/util/javac/JaninoCompiler.class > [DEBUG] Rewrote class bytecode: > org/apache/calcite/util/javac/JaninoCompiler$AccountingClassLoader.class > [DEBUG] Rewrote class bytecode: > org/apache/calcite/adapter/enumerable/EnumerableInterpretable$1$1.class > [DEBUG] Rewrote class bytecode: > org/apache/calcite/adapter/enumerable/EnumerableInterpretable$EnumerableNode.class > [DEBUG] Rewrote class bytecode: > org/apache/calcite/adapter/enumerable/EnumerableInterpretable$1.class > [DEBUG] Keeping original class bytecode: > org/apache/calcite/adapter/enumerable/EnumerableInterpretable$StaticFieldDetector.class > [DEBUG] Rewrote class bytecode: > org/apache/calcite/adapter/enumerable/EnumerableInterpretable.class > [DEBUG] Rewrote class bytecode: > org/apache/calcite/adapter/enumerable/EnumerableInterpretable$1$1$1.class > [DEBUG] Keeping original class bytecode: > org/apache/calcite/jdbc/CalciteSchemaBuilder.class > [DEBUG] Rewrote class bytecode: > org/apache/calcite/rex/RexSimplify$SafeRexVisitor.class > [DEBUG] Rewrote class bytecode: > org/apache/calcite/rex/RexSimplify$SargCollector.class > [DEBUG] Rewrote class bytecode: > org/apache/calcite/rex/RexSimplify$RexSargBuilder.class > [DEBUG] Rewrote class bytecode: > org/apache/calcite/rex/RexSimplify$IsPredicate.class > [DEBUG] Rewrote class bytecode: > org/apache/calcite/rex/RexSimplify$Predicate.class > [DEBUG] Rewrote class bytecode: > org/apache/calcite/rex/RexSimplify$Comparison.class > [DEBUG] Keeping original class
[jira] [Created] (FLINK-32680) Job vertex names get messed up once there is a source vertex chained with a MultipleInput vertex in job graph
Lijie Wang created FLINK-32680: -- Summary: Job vertex names get messed up once there is a source vertex chained with a MultipleInput vertex in job graph Key: FLINK-32680 URL: https://issues.apache.org/jira/browse/FLINK-32680 Project: Flink Issue Type: Bug Affects Versions: 1.17.1, 1.16.2, 1.18.0 Reporter: Lijie Wang Attachments: image-2023-07-26-15-01-51-886.png, image-2023-07-26-15-23-29-551.png, image-2023-07-26-15-24-24-077.png Take the following test(put it to {{{}MultipleInputITCase{}}}) as example: {code:java} @Test public void testMultipleInputDoesNotChainedWithSource() throws Exception { testJobVertexName(false); } @Test public void testMultipleInputChainedWithSource() throws Exception { testJobVertexName(true); } public void testJobVertexName(boolean chain) throws Exception { StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); env.setParallelism(1); TestListResultSink resultSink = new TestListResultSink<>(); DataStream source1 = env.fromSequence(0L, 3L).name("source1"); DataStream source2 = env.fromElements(4L, 6L).name("source2"); DataStream source3 = env.fromElements(7L, 9L).name("source3"); KeyedMultipleInputTransformation transform = new KeyedMultipleInputTransformation<>( "MultipleInput", new KeyedSumMultipleInputOperatorFactory(), BasicTypeInfo.LONG_TYPE_INFO, 1, BasicTypeInfo.LONG_TYPE_INFO); if (chain) { transform.setChainingStrategy(ChainingStrategy.HEAD_WITH_SOURCES); } KeySelector keySelector = (KeySelector) value -> value % 3; env.addOperator( transform .addInput(source1.getTransformation(), keySelector) .addInput(source2.getTransformation(), keySelector) .addInput(source3.getTransformation(), keySelector)); new MultipleConnectedStreams(env).transform(transform).rebalance().addSink(resultSink).name("sink"); env.execute(); }{code} When we run {{{}testMultipleInputDoesNotChainedWithSource{}}}, all job vertex names are normal. !image-2023-07-26-15-24-24-077.png|width=494,height=246! When we run {{{}testMultipleInputChainedWithSource{}}}, job vertex names get messed up (all names contain {{{}Source: source1{}}}). I think it's a bug. !image-2023-07-26-15-23-29-551.png|width=515,height=182! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-32680) Job vertex names get messed up once there is a source vertex chained with a MultipleInput vertex in job graph
[ https://issues.apache.org/jira/browse/FLINK-32680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lijie Wang updated FLINK-32680: --- Description: Take the following test(put it to {{{}MultipleInputITCase{}}}) as example: {code:java} @Test public void testMultipleInputDoesNotChainedWithSource() throws Exception { testJobVertexName(false); } @Test public void testMultipleInputChainedWithSource() throws Exception { testJobVertexName(true); } public void testJobVertexName(boolean chain) throws Exception { StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); env.setParallelism(1); TestListResultSink resultSink = new TestListResultSink<>(); DataStream source1 = env.fromSequence(0L, 3L).name("source1"); DataStream source2 = env.fromElements(4L, 6L).name("source2"); DataStream source3 = env.fromElements(7L, 9L).name("source3"); KeyedMultipleInputTransformation transform = new KeyedMultipleInputTransformation<>( "MultipleInput", new KeyedSumMultipleInputOperatorFactory(), BasicTypeInfo.LONG_TYPE_INFO, 1, BasicTypeInfo.LONG_TYPE_INFO); if (chain) { transform.setChainingStrategy(ChainingStrategy.HEAD_WITH_SOURCES); } KeySelector keySelector = (KeySelector) value -> value % 3; env.addOperator( transform .addInput(source1.getTransformation(), keySelector) .addInput(source2.getTransformation(), keySelector) .addInput(source3.getTransformation(), keySelector)); new MultipleConnectedStreams(env).transform(transform).rebalance().addSink(resultSink).name("sink"); env.execute(); }{code} When we run {{{}testMultipleInputDoesNotChainedWithSource{}}}, all job vertex names are normal: !image-2023-07-26-15-24-24-077.png|width=494,height=246! When we run {{{}testMultipleInputChainedWithSource{}}}, job vertex names get messed up (all names contain {{{}Source: source1{}}}): !image-2023-07-26-15-23-29-551.png|width=515,height=182! I think it's a bug. was: Take the following test(put it to {{{}MultipleInputITCase{}}}) as example: {code:java} @Test public void testMultipleInputDoesNotChainedWithSource() throws Exception { testJobVertexName(false); } @Test public void testMultipleInputChainedWithSource() throws Exception { testJobVertexName(true); } public void testJobVertexName(boolean chain) throws Exception { StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); env.setParallelism(1); TestListResultSink resultSink = new TestListResultSink<>(); DataStream source1 = env.fromSequence(0L, 3L).name("source1"); DataStream source2 = env.fromElements(4L, 6L).name("source2"); DataStream source3 = env.fromElements(7L, 9L).name("source3"); KeyedMultipleInputTransformation transform = new KeyedMultipleInputTransformation<>( "MultipleInput", new KeyedSumMultipleInputOperatorFactory(), BasicTypeInfo.LONG_TYPE_INFO, 1, BasicTypeInfo.LONG_TYPE_INFO); if (chain) { transform.setChainingStrategy(ChainingStrategy.HEAD_WITH_SOURCES); } KeySelector keySelector = (KeySelector) value -> value % 3; env.addOperator( transform .addInput(source1.getTransformation(), keySelector) .addInput(source2.getTransformation(), keySelector) .addInput(source3.getTransformation(), keySelector)); new MultipleConnectedStreams(env).transform(transform).rebalance().addSink(resultSink).name("sink"); env.execute(); }{code} When we run {{{}testMultipleInputDoesNotChainedWithSource{}}}, all job vertex names are normal. !image-2023-07-26-15-24-24-077.png|width=494,height=246! When we run {{{}testMultipleInputChainedWithSource{}}}, job vertex names get messed up (all names contain {{{}Source: source1{}}}). I think it's a bug. !image-2023-07-26-15-23-29-551.png|width=515,height=182! > Job vertex names get messed up once there is a source vertex chained with a > MultipleInput vertex in job graph > - > > Key: FLINK-32680 > URL: https://issues.apache.org/jira/browse/FLINK-32680 > Project: Flink > Issue Type: Bug >Affects Versions: 1.16.2, 1.18.0, 1.17.1 >Reporter: Lij
[jira] [Updated] (FLINK-32680) Job vertex names get messed up once there is a source vertex chained with a MultipleInput vertex in job graph
[ https://issues.apache.org/jira/browse/FLINK-32680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lijie Wang updated FLINK-32680: --- Attachment: (was: image-2023-07-26-15-01-51-886.png) > Job vertex names get messed up once there is a source vertex chained with a > MultipleInput vertex in job graph > - > > Key: FLINK-32680 > URL: https://issues.apache.org/jira/browse/FLINK-32680 > Project: Flink > Issue Type: Bug >Affects Versions: 1.16.2, 1.18.0, 1.17.1 >Reporter: Lijie Wang >Priority: Major > Attachments: image-2023-07-26-15-23-29-551.png, > image-2023-07-26-15-24-24-077.png > > > Take the following test(put it to {{{}MultipleInputITCase{}}}) as example: > {code:java} > @Test > public void testMultipleInputDoesNotChainedWithSource() throws Exception { > testJobVertexName(false); > } > > @Test > public void testMultipleInputChainedWithSource() throws Exception { > testJobVertexName(true); > } > public void testJobVertexName(boolean chain) throws Exception { > StreamExecutionEnvironment env = > StreamExecutionEnvironment.getExecutionEnvironment(); > env.setParallelism(1); > TestListResultSink resultSink = new TestListResultSink<>(); > DataStream source1 = env.fromSequence(0L, 3L).name("source1"); > DataStream source2 = env.fromElements(4L, 6L).name("source2"); > DataStream source3 = env.fromElements(7L, 9L).name("source3"); > KeyedMultipleInputTransformation transform = > new KeyedMultipleInputTransformation<>( > "MultipleInput", > new KeyedSumMultipleInputOperatorFactory(), > BasicTypeInfo.LONG_TYPE_INFO, > 1, > BasicTypeInfo.LONG_TYPE_INFO); > if (chain) { > transform.setChainingStrategy(ChainingStrategy.HEAD_WITH_SOURCES); > } > KeySelector keySelector = (KeySelector) value > -> value % 3; > env.addOperator( > transform > .addInput(source1.getTransformation(), keySelector) > .addInput(source2.getTransformation(), keySelector) > .addInput(source3.getTransformation(), keySelector)); > new > MultipleConnectedStreams(env).transform(transform).rebalance().addSink(resultSink).name("sink"); > env.execute(); > }{code} > > When we run {{{}testMultipleInputDoesNotChainedWithSource{}}}, all job vertex > names are normal: > !image-2023-07-26-15-24-24-077.png|width=494,height=246! > When we run {{{}testMultipleInputChainedWithSource{}}}, job vertex names get > messed up (all names contain {{{}Source: source1{}}}): > !image-2023-07-26-15-23-29-551.png|width=515,height=182! > > I think it's a bug. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-28045) [umbrella] Deprecate SourceFunction API
[ https://issues.apache.org/jira/browse/FLINK-28045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17747334#comment-17747334 ] Konstantin Knauf commented on FLINK-28045: -- [~airblader] What we still need to do, though, is actually aligning on the list of these issues/blockers. > [umbrella] Deprecate SourceFunction API > --- > > Key: FLINK-28045 > URL: https://issues.apache.org/jira/browse/FLINK-28045 > Project: Flink > Issue Type: Technical Debt > Components: Connectors / Common >Reporter: Alexander Fedulov >Assignee: Alexander Fedulov >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-32680) Job vertex names get messed up once there is a source vertex chained with a MultipleInput vertex in job graph
[ https://issues.apache.org/jira/browse/FLINK-32680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lijie Wang updated FLINK-32680: --- Description: Take the following test(put it to {{MultipleInputITCase}}) as example: {code:java} @Test public void testMultipleInputDoesNotChainedWithSource() throws Exception { testJobVertexName(false); } @Test public void testMultipleInputChainedWithSource() throws Exception { testJobVertexName(true); } public void testJobVertexName(boolean chain) throws Exception { StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); env.setParallelism(1); TestListResultSink resultSink = new TestListResultSink<>(); DataStream source1 = env.fromSequence(0L, 3L).name("source1"); DataStream source2 = env.fromElements(4L, 6L).name("source2"); DataStream source3 = env.fromElements(7L, 9L).name("source3"); KeyedMultipleInputTransformation transform = new KeyedMultipleInputTransformation<>( "MultipleInput", new KeyedSumMultipleInputOperatorFactory(), BasicTypeInfo.LONG_TYPE_INFO, 1, BasicTypeInfo.LONG_TYPE_INFO); if (chain) { transform.setChainingStrategy(ChainingStrategy.HEAD_WITH_SOURCES); } KeySelector keySelector = (KeySelector) value -> value % 3; env.addOperator( transform .addInput(source1.getTransformation(), keySelector) .addInput(source2.getTransformation(), keySelector) .addInput(source3.getTransformation(), keySelector)); new MultipleConnectedStreams(env).transform(transform).rebalance().addSink(resultSink).name("sink"); env.execute(); }{code} When we run {{testMultipleInputDoesNotChainedWithSource}} , all job vertex names are normal: !image-2023-07-26-15-24-24-077.png|width=494,height=246! When we run {{testMultipleInputChainedWithSource}} (the MultipleInput chained with source1), job vertex names get messed up (all job vertex names contain {{{}Source: source1{}}}): !image-2023-07-26-15-23-29-551.png|width=515,height=182! I think it's a bug. was: Take the following test(put it to {{{}MultipleInputITCase{}}}) as example: {code:java} @Test public void testMultipleInputDoesNotChainedWithSource() throws Exception { testJobVertexName(false); } @Test public void testMultipleInputChainedWithSource() throws Exception { testJobVertexName(true); } public void testJobVertexName(boolean chain) throws Exception { StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); env.setParallelism(1); TestListResultSink resultSink = new TestListResultSink<>(); DataStream source1 = env.fromSequence(0L, 3L).name("source1"); DataStream source2 = env.fromElements(4L, 6L).name("source2"); DataStream source3 = env.fromElements(7L, 9L).name("source3"); KeyedMultipleInputTransformation transform = new KeyedMultipleInputTransformation<>( "MultipleInput", new KeyedSumMultipleInputOperatorFactory(), BasicTypeInfo.LONG_TYPE_INFO, 1, BasicTypeInfo.LONG_TYPE_INFO); if (chain) { transform.setChainingStrategy(ChainingStrategy.HEAD_WITH_SOURCES); } KeySelector keySelector = (KeySelector) value -> value % 3; env.addOperator( transform .addInput(source1.getTransformation(), keySelector) .addInput(source2.getTransformation(), keySelector) .addInput(source3.getTransformation(), keySelector)); new MultipleConnectedStreams(env).transform(transform).rebalance().addSink(resultSink).name("sink"); env.execute(); }{code} When we run {{{}testMultipleInputDoesNotChainedWithSource{}}}, all job vertex names are normal: !image-2023-07-26-15-24-24-077.png|width=494,height=246! When we run {{{}testMultipleInputChainedWithSource{}}}, job vertex names get messed up (all names contain {{{}Source: source1{}}}): !image-2023-07-26-15-23-29-551.png|width=515,height=182! I think it's a bug. > Job vertex names get messed up once there is a source vertex chained with a > MultipleInput vertex in job graph > - > > Key: FLINK-32680 > URL: https://issues.apache.org/jira/browse/FLINK-32680 > Project: Flink > Issue Type: Bug >Affects Versions: 1.1
[jira] [Commented] (FLINK-15736) Support Java 17 (LTS)
[ https://issues.apache.org/jira/browse/FLINK-15736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17747337#comment-17747337 ] Chesnay Schepler commented on FLINK-15736: -- There should be a troubleshooting entry somewhere for errors related to the JKDK modularization. > Support Java 17 (LTS) > - > > Key: FLINK-15736 > URL: https://issues.apache.org/jira/browse/FLINK-15736 > Project: Flink > Issue Type: New Feature > Components: Build System >Reporter: Chesnay Schepler >Assignee: Chesnay Schepler >Priority: Major > Labels: auto-deprioritized-major, pull-request-available, > stale-assigned > Fix For: 1.18.0 > > > Long-term issue for preparing Flink for Java 17. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-32681) RocksDBStateDownloaderTest.testMultiThreadCleanupOnFailure unstablie
Chesnay Schepler created FLINK-32681: Summary: RocksDBStateDownloaderTest.testMultiThreadCleanupOnFailure unstablie Key: FLINK-32681 URL: https://issues.apache.org/jira/browse/FLINK-32681 Project: Flink Issue Type: Technical Debt Components: Runtime / State Backends, Tests Affects Versions: 1.18.0 Reporter: Chesnay Schepler Fix For: 1.18.0 https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=51712&view=logs&j=77a9d8e1-d610-59b3-fc2a-4766541e0e33&t=125e07e7-8de0-5c6c-a541-a567415af3ef -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-32681) RocksDBStateDownloaderTest.testMultiThreadCleanupOnFailure unstablie
[ https://issues.apache.org/jira/browse/FLINK-32681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chesnay Schepler updated FLINK-32681: - Description: https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=51712&view=logs&j=77a9d8e1-d610-59b3-fc2a-4766541e0e33&t=125e07e7-8de0-5c6c-a541-a567415af3ef Failed 3 times in yesterdays nightly run. {code} Jul 26 01:12:46 01:12:46.889 [ERROR] org.apache.flink.contrib.streaming.state.RocksDBStateDownloaderTest.testMultiThreadCleanupOnFailure Time elapsed: 0.044 s <<< FAILURE! Jul 26 01:12:46 java.lang.AssertionError Jul 26 01:12:46 at org.junit.Assert.fail(Assert.java:87) Jul 26 01:12:46 at org.junit.Assert.assertTrue(Assert.java:42) Jul 26 01:12:46 at org.junit.Assert.assertFalse(Assert.java:65) Jul 26 01:12:46 at org.junit.Assert.assertFalse(Assert.java:75) Jul 26 01:12:46 at org.apache.flink.contrib.streaming.state.RocksDBStateDownloaderTest.testMultiThreadCleanupOnFailure(RocksDBStateDownloaderTest.java:151) {code} was:https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=51712&view=logs&j=77a9d8e1-d610-59b3-fc2a-4766541e0e33&t=125e07e7-8de0-5c6c-a541-a567415af3ef > RocksDBStateDownloaderTest.testMultiThreadCleanupOnFailure unstablie > > > Key: FLINK-32681 > URL: https://issues.apache.org/jira/browse/FLINK-32681 > Project: Flink > Issue Type: Technical Debt > Components: Runtime / State Backends, Tests >Affects Versions: 1.18.0 >Reporter: Chesnay Schepler >Priority: Major > Fix For: 1.18.0 > > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=51712&view=logs&j=77a9d8e1-d610-59b3-fc2a-4766541e0e33&t=125e07e7-8de0-5c6c-a541-a567415af3ef > Failed 3 times in yesterdays nightly run. > {code} > Jul 26 01:12:46 01:12:46.889 [ERROR] > org.apache.flink.contrib.streaming.state.RocksDBStateDownloaderTest.testMultiThreadCleanupOnFailure > Time elapsed: 0.044 s <<< FAILURE! > Jul 26 01:12:46 java.lang.AssertionError > Jul 26 01:12:46 at org.junit.Assert.fail(Assert.java:87) > Jul 26 01:12:46 at org.junit.Assert.assertTrue(Assert.java:42) > Jul 26 01:12:46 at org.junit.Assert.assertFalse(Assert.java:65) > Jul 26 01:12:46 at org.junit.Assert.assertFalse(Assert.java:75) > Jul 26 01:12:46 at > org.apache.flink.contrib.streaming.state.RocksDBStateDownloaderTest.testMultiThreadCleanupOnFailure(RocksDBStateDownloaderTest.java:151) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-27054) Elasticsearch SQL connector SSL issue
[ https://issues.apache.org/jira/browse/FLINK-27054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17747360#comment-17747360 ] Kelu Tao commented on FLINK-27054: -- [~martijnvisser] OK, assign it to me plz. Thanks. > Elasticsearch SQL connector SSL issue > - > > Key: FLINK-27054 > URL: https://issues.apache.org/jira/browse/FLINK-27054 > Project: Flink > Issue Type: Bug > Components: Connectors / ElasticSearch >Reporter: ricardo >Priority: Major > > The current Flink ElasticSearch SQL connector > https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/elasticsearch/ > is missing SSL options, can't connect to ES clusters which require SSL > certificate. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (FLINK-27054) Elasticsearch SQL connector SSL issue
[ https://issues.apache.org/jira/browse/FLINK-27054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Martijn Visser reassigned FLINK-27054: -- Assignee: Kelu Tao > Elasticsearch SQL connector SSL issue > - > > Key: FLINK-27054 > URL: https://issues.apache.org/jira/browse/FLINK-27054 > Project: Flink > Issue Type: Bug > Components: Connectors / ElasticSearch >Reporter: ricardo >Assignee: Kelu Tao >Priority: Major > > The current Flink ElasticSearch SQL connector > https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/elasticsearch/ > is missing SSL options, can't connect to ES clusters which require SSL > certificate. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink-kubernetes-operator] gyfora commented on a diff in pull request #637: [FLINK-29634] Support periodic checkpoint triggering
gyfora commented on code in PR #637: URL: https://github.com/apache/flink-kubernetes-operator/pull/637#discussion_r1274542988 ## flink-kubernetes-operator/src/test/java/org/apache/flink/kubernetes/operator/reconciler/sessionjob/SessionJobReconcilerTest.java: ## @@ -409,24 +417,143 @@ public void testTriggerSavepoint() throws Exception { sp1SessionJob, JobState.RUNNING, RECONCILING.name(), null, flinkService.listJobs()); sp1SessionJob.getStatus().getJobStatus().getSavepointInfo().resetTrigger(); -ReconciliationUtils.updateLastReconciledSavepointTriggerNonce( -sp1SessionJob.getStatus().getJobStatus().getSavepointInfo(), sp1SessionJob); +ReconciliationUtils.updateLastReconciledSnapshotTriggerNonce( +sp1SessionJob.getStatus().getJobStatus().getSavepointInfo(), +sp1SessionJob, +SAVEPOINT); // trigger when new nonce is defined sp1SessionJob.getSpec().getJob().setSavepointTriggerNonce(4L); reconciler.reconcile(sp1SessionJob, readyContext); assertEquals( -"trigger_1", +"savepoint_trigger_1", sp1SessionJob.getStatus().getJobStatus().getSavepointInfo().getTriggerId()); sp1SessionJob.getStatus().getJobStatus().getSavepointInfo().resetTrigger(); -ReconciliationUtils.updateLastReconciledSavepointTriggerNonce( -sp1SessionJob.getStatus().getJobStatus().getSavepointInfo(), sp1SessionJob); +ReconciliationUtils.updateLastReconciledSnapshotTriggerNonce( +sp1SessionJob.getStatus().getJobStatus().getSavepointInfo(), +sp1SessionJob, +SAVEPOINT); // don't trigger when nonce is cleared sp1SessionJob.getSpec().getJob().setSavepointTriggerNonce(null); reconciler.reconcile(sp1SessionJob, readyContext); - assertFalse(SavepointUtils.savepointInProgress(sp1SessionJob.getStatus().getJobStatus())); + assertFalse(SnapshotUtils.savepointInProgress(sp1SessionJob.getStatus().getJobStatus())); +} + +@Test +public void testTriggerCheckpoint() throws Exception { +FlinkSessionJob sessionJob = TestUtils.buildSessionJob(); + assertFalse(SnapshotUtils.checkpointInProgress(getJobStatus(sessionJob))); + +var readyContext = TestUtils.createContextWithReadyFlinkDeployment(); +reconciler.reconcile(sessionJob, readyContext); +verifyAndSetRunningJobsToStatus( +sessionJob, JobState.RUNNING, RECONCILING.name(), null, flinkService.listJobs()); + + assertFalse(SnapshotUtils.checkpointInProgress(getJobStatus(sessionJob))); + +// trigger checkpoint +var sp1SessionJob = ReconciliationUtils.clone(sessionJob); + +// do not trigger checkpoint if nonce is null +reconciler.reconcile(sp1SessionJob, readyContext); + assertFalse(SnapshotUtils.checkpointInProgress(getJobStatus(sp1SessionJob))); + +getJobSpec(sp1SessionJob).setCheckpointTriggerNonce(2L); +getJobStatus(sp1SessionJob).setState(CREATED.name()); +reconciler.reconcile(sp1SessionJob, readyContext); +// do not trigger checkpoint if job is not running + assertFalse(SnapshotUtils.checkpointInProgress(getJobStatus(sp1SessionJob))); + +getJobStatus(sp1SessionJob).setState(RUNNING.name()); + +reconciler.reconcile(sp1SessionJob, readyContext); + assertTrue(SnapshotUtils.checkpointInProgress(getJobStatus(sp1SessionJob))); + +// the last reconcile nonce updated + assertNull(getReconciledJobSpec(sp1SessionJob).getCheckpointTriggerNonce()); + +// don't trigger new checkpoint when checkpoint is in progress +getJobSpec(sp1SessionJob).setCheckpointTriggerNonce(3L); +reconciler.reconcile(sp1SessionJob, readyContext); +assertEquals("checkpoint_trigger_0", getCheckpointInfo(sp1SessionJob).getTriggerId()); +/* +TODO: this section needs to be reintroduced in case the LAST_STATE optimization gets + added Review Comment: Sorry, what optimisation are we talking about here? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] patricklucas commented on a diff in pull request #22987: [FLINK-32583][rest] Fix deadlock in RestClient
patricklucas commented on code in PR #22987: URL: https://github.com/apache/flink/pull/22987#discussion_r1274549521 ## flink-runtime/src/test/java/org/apache/flink/runtime/rest/RestClientTest.java: ## @@ -207,6 +210,40 @@ public void testRestClientClosedHandling() throws Exception { } } +/** + * Tests that the futures returned by {@link RestClient} fail immediately if the client is + * already closed. + * + * See FLINK-32583 + */ +@Test +public void testCloseClientBeforeRequest() throws Exception { Review Comment: Might be able to get the same behavior with just a `@VisibleForTesting` getter for `responseChannelFutures`, I'll give it a shot. Always makes me uneasy allowing injection or adding extra steps to initializing `ConcurrentXyz` fields, even for testing, since you can't enforce the concurrency-safety with the type system. ## flink-runtime/src/main/java/org/apache/flink/runtime/rest/RestClient.java: ## @@ -150,11 +159,30 @@ public static RestClient forUrl(Configuration configuration, Executor executor, public RestClient(Configuration configuration, Executor executor) throws ConfigurationException { -this(configuration, executor, null, -1); +this(configuration, executor, null, -1, DefaultSelectStrategyFactory.INSTANCE); Review Comment: Once again, just wasn't sure what Flink's preferred style is—reducing the number of constructor indirections or reducing the number of default values materialized. :) Happy to change. ## flink-runtime/src/main/java/org/apache/flink/runtime/rest/RestClient.java: ## @@ -150,11 +159,30 @@ public static RestClient forUrl(Configuration configuration, Executor executor, public RestClient(Configuration configuration, Executor executor) throws ConfigurationException { -this(configuration, executor, null, -1); +this(configuration, executor, null, -1, DefaultSelectStrategyFactory.INSTANCE); Review Comment: Once again, just wasn't sure what Flink's preferred style is—reducing the number of constructor indirections or reducing the number of default values materialized. :) Happy to change. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (FLINK-32682) Introduce option for choosing time function evaluation methods
Dawid Wysakowicz created FLINK-32682: Summary: Introduce option for choosing time function evaluation methods Key: FLINK-32682 URL: https://issues.apache.org/jira/browse/FLINK-32682 Project: Flink Issue Type: New Feature Components: Table SQL / Planner, Table SQL / Runtime Reporter: Dawid Wysakowicz Fix For: 1.18.0 In FLIP-162 as future plans it was discussed to introduce an option {{table.exec.time-function-evaluation}} to control evaluation method of time function. We should add this option to be able to evaluate time functions with {{query-time}} method in streaming mode. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-32682) Introduce option for choosing time function evaluation methods
[ https://issues.apache.org/jira/browse/FLINK-32682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Wysakowicz updated FLINK-32682: - Description: In [FLIP-162|https://cwiki.apache.org/confluence/x/KAxRCg] as future plans it was discussed to introduce an option {{table.exec.time-function-evaluation}} to control evaluation method of time function. We should add this option to be able to evaluate time functions with {{query-time}} method in streaming mode. was: In FLIP-162 as future plans it was discussed to introduce an option {{table.exec.time-function-evaluation}} to control evaluation method of time function. We should add this option to be able to evaluate time functions with {{query-time}} method in streaming mode. > Introduce option for choosing time function evaluation methods > -- > > Key: FLINK-32682 > URL: https://issues.apache.org/jira/browse/FLINK-32682 > Project: Flink > Issue Type: New Feature > Components: Table SQL / Planner, Table SQL / Runtime >Reporter: Dawid Wysakowicz >Priority: Major > Fix For: 1.18.0 > > > In [FLIP-162|https://cwiki.apache.org/confluence/x/KAxRCg] as future plans it > was discussed to introduce an option {{table.exec.time-function-evaluation}} > to control evaluation method of time function. > We should add this option to be able to evaluate time functions with > {{query-time}} method in streaming mode. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] XComp commented on a diff in pull request #22987: [FLINK-32583][rest] Fix deadlock in RestClient
XComp commented on code in PR #22987: URL: https://github.com/apache/flink/pull/22987#discussion_r1274576162 ## flink-runtime/src/test/java/org/apache/flink/runtime/rest/RestClientTest.java: ## @@ -207,6 +210,40 @@ public void testRestClientClosedHandling() throws Exception { } } +/** + * Tests that the futures returned by {@link RestClient} fail immediately if the client is + * already closed. + * + * See FLINK-32583 + */ +@Test +public void testCloseClientBeforeRequest() throws Exception { Review Comment: that's a fair point - I know what you mean. :-D -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Assigned] (FLINK-32670) Annotate interfaces that inherit from SourceFunction as deprecated
[ https://issues.apache.org/jira/browse/FLINK-32670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Fedulov reassigned FLINK-32670: - Assignee: Alexander Fedulov > Annotate interfaces that inherit from SourceFunction as deprecated > --- > > Key: FLINK-32670 > URL: https://issues.apache.org/jira/browse/FLINK-32670 > Project: Flink > Issue Type: Sub-task >Reporter: Alexander Fedulov >Assignee: Alexander Fedulov >Priority: Major > > ParallelSourceFunction, RichParallelSourceFunction, ExternallyInducedSource -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] XComp commented on a diff in pull request #22987: [FLINK-32583][rest] Fix deadlock in RestClient
XComp commented on code in PR #22987: URL: https://github.com/apache/flink/pull/22987#discussion_r1274576162 ## flink-runtime/src/test/java/org/apache/flink/runtime/rest/RestClientTest.java: ## @@ -207,6 +210,40 @@ public void testRestClientClosedHandling() throws Exception { } } +/** + * Tests that the futures returned by {@link RestClient} fail immediately if the client is + * already closed. + * + * See FLINK-32583 + */ +@Test +public void testCloseClientBeforeRequest() throws Exception { Review Comment: that's a fair point - I know what you mean. :-D You could use JavaDoc to underline that. But using the compiler as a safeguard is a better option, I guess. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Assigned] (FLINK-32645) Flink pulsar sink is having poor performance
[ https://issues.apache.org/jira/browse/FLINK-32645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zili Chen reassigned FLINK-32645: - Assignee: Zili Chen > Flink pulsar sink is having poor performance > > > Key: FLINK-32645 > URL: https://issues.apache.org/jira/browse/FLINK-32645 > Project: Flink > Issue Type: Bug > Components: Connectors / Pulsar >Affects Versions: 1.16.2 > Environment: !Screenshot 2023-07-22 at 1.59.42 PM.png!!Screenshot > 2023-07-22 at 2.03.53 PM.png! > >Reporter: Vijaya Bhaskar V >Assignee: Zili Chen >Priority: Major > Attachments: Screenshot 2023-07-22 at 2.03.53 PM.png, Screenshot > 2023-07-22 at 2.56.55 PM.png, Screenshot 2023-07-22 at 3.45.21 PM-1.png, > Screenshot 2023-07-22 at 3.45.21 PM.png, pom.xml > > > Found following issue with flink pulsar sink: > > Flink pulsar sink is always waiting while enqueueing the message and making > the task slot busy no matter how many free slots we provide. Attached the > screen shot of the same > Just sending messages of less rate 8k msg/sec and stand alone flink job with > discarding sink is able to receive full rate if 8K msg/sec > Where as pulsar sink was consuming only upto 2K msg/sec and the sink is > always busy waiting. Snapshot of thread dump attached. > Also snap shot of flink stream graph attached > > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink-kubernetes-operator] afedulov commented on a diff in pull request #637: [FLINK-29634] Support periodic checkpoint triggering
afedulov commented on code in PR #637: URL: https://github.com/apache/flink-kubernetes-operator/pull/637#discussion_r1274603164 ## flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/observer/SnapshotObserver.java: ## @@ -74,8 +80,32 @@ public void observeSavepointStatus(FlinkResourceContext ctx) { cleanupSavepointHistory(ctx, savepointInfo); } +public void observeCheckpointStatus(FlinkResourceContext ctx) { +if (!isCheckpointsTriggeringSupported(ctx.getObserveConfig())) { +return; +} +var resource = ctx.getResource(); +var jobStatus = resource.getStatus().getJobStatus(); +var checkpointInfo = jobStatus.getSavepointInfo(); +var jobId = jobStatus.getJobId(); + +// If any manual or periodic savepoint is in progress, observe it +if (SnapshotUtils.checkpointInProgress(jobStatus)) { +observeTriggeredCheckpoint(ctx, jobId); +} + +// REVIEW: clarify if this is relevant for checkpoints. +/* +// If job is in globally terminal state, observe last savepoint +if (ReconciliationUtils.isJobInTerminalState(resource.getStatus())) { +observeLatestCheckpoint( +ctx.getFlinkService(), checkpointInfo, jobId, ctx.getObserveConfig()); +} +*/ Review Comment: So, it seems that `observeLatestSavepoint` therefore already covers all of the lifecycle and recovery needs and we do not necessarily need to introduce the `observeLatestCheckpoint ` method, am I right? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink-kubernetes-operator] gyfora commented on a diff in pull request #637: [FLINK-29634] Support periodic checkpoint triggering
gyfora commented on code in PR #637: URL: https://github.com/apache/flink-kubernetes-operator/pull/637#discussion_r1274612726 ## flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/observer/SnapshotObserver.java: ## @@ -74,8 +80,32 @@ public void observeSavepointStatus(FlinkResourceContext ctx) { cleanupSavepointHistory(ctx, savepointInfo); } +public void observeCheckpointStatus(FlinkResourceContext ctx) { +if (!isCheckpointsTriggeringSupported(ctx.getObserveConfig())) { +return; +} +var resource = ctx.getResource(); +var jobStatus = resource.getStatus().getJobStatus(); +var checkpointInfo = jobStatus.getSavepointInfo(); +var jobId = jobStatus.getJobId(); + +// If any manual or periodic savepoint is in progress, observe it +if (SnapshotUtils.checkpointInProgress(jobStatus)) { +observeTriggeredCheckpoint(ctx, jobId); +} + +// REVIEW: clarify if this is relevant for checkpoints. +/* +// If job is in globally terminal state, observe last savepoint +if (ReconciliationUtils.isJobInTerminalState(resource.getStatus())) { +observeLatestCheckpoint( +ctx.getFlinkService(), checkpointInfo, jobId, ctx.getObserveConfig()); +} +*/ Review Comment: yes, I believe so -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink-kubernetes-operator] afedulov commented on a diff in pull request #637: [FLINK-29634] Support periodic checkpoint triggering
afedulov commented on code in PR #637: URL: https://github.com/apache/flink-kubernetes-operator/pull/637#discussion_r1274616976 ## flink-kubernetes-operator/src/test/java/org/apache/flink/kubernetes/operator/reconciler/sessionjob/SessionJobReconcilerTest.java: ## @@ -409,24 +417,143 @@ public void testTriggerSavepoint() throws Exception { sp1SessionJob, JobState.RUNNING, RECONCILING.name(), null, flinkService.listJobs()); sp1SessionJob.getStatus().getJobStatus().getSavepointInfo().resetTrigger(); -ReconciliationUtils.updateLastReconciledSavepointTriggerNonce( -sp1SessionJob.getStatus().getJobStatus().getSavepointInfo(), sp1SessionJob); +ReconciliationUtils.updateLastReconciledSnapshotTriggerNonce( +sp1SessionJob.getStatus().getJobStatus().getSavepointInfo(), +sp1SessionJob, +SAVEPOINT); // trigger when new nonce is defined sp1SessionJob.getSpec().getJob().setSavepointTriggerNonce(4L); reconciler.reconcile(sp1SessionJob, readyContext); assertEquals( -"trigger_1", +"savepoint_trigger_1", sp1SessionJob.getStatus().getJobStatus().getSavepointInfo().getTriggerId()); sp1SessionJob.getStatus().getJobStatus().getSavepointInfo().resetTrigger(); -ReconciliationUtils.updateLastReconciledSavepointTriggerNonce( -sp1SessionJob.getStatus().getJobStatus().getSavepointInfo(), sp1SessionJob); +ReconciliationUtils.updateLastReconciledSnapshotTriggerNonce( +sp1SessionJob.getStatus().getJobStatus().getSavepointInfo(), +sp1SessionJob, +SAVEPOINT); // don't trigger when nonce is cleared sp1SessionJob.getSpec().getJob().setSavepointTriggerNonce(null); reconciler.reconcile(sp1SessionJob, readyContext); - assertFalse(SavepointUtils.savepointInProgress(sp1SessionJob.getStatus().getJobStatus())); + assertFalse(SnapshotUtils.savepointInProgress(sp1SessionJob.getStatus().getJobStatus())); +} + +@Test +public void testTriggerCheckpoint() throws Exception { +FlinkSessionJob sessionJob = TestUtils.buildSessionJob(); + assertFalse(SnapshotUtils.checkpointInProgress(getJobStatus(sessionJob))); + +var readyContext = TestUtils.createContextWithReadyFlinkDeployment(); +reconciler.reconcile(sessionJob, readyContext); +verifyAndSetRunningJobsToStatus( +sessionJob, JobState.RUNNING, RECONCILING.name(), null, flinkService.listJobs()); + + assertFalse(SnapshotUtils.checkpointInProgress(getJobStatus(sessionJob))); + +// trigger checkpoint +var sp1SessionJob = ReconciliationUtils.clone(sessionJob); + +// do not trigger checkpoint if nonce is null +reconciler.reconcile(sp1SessionJob, readyContext); + assertFalse(SnapshotUtils.checkpointInProgress(getJobStatus(sp1SessionJob))); + +getJobSpec(sp1SessionJob).setCheckpointTriggerNonce(2L); +getJobStatus(sp1SessionJob).setState(CREATED.name()); +reconciler.reconcile(sp1SessionJob, readyContext); +// do not trigger checkpoint if job is not running + assertFalse(SnapshotUtils.checkpointInProgress(getJobStatus(sp1SessionJob))); + +getJobStatus(sp1SessionJob).setState(RUNNING.name()); + +reconciler.reconcile(sp1SessionJob, readyContext); + assertTrue(SnapshotUtils.checkpointInProgress(getJobStatus(sp1SessionJob))); + +// the last reconcile nonce updated + assertNull(getReconciledJobSpec(sp1SessionJob).getCheckpointTriggerNonce()); + +// don't trigger new checkpoint when checkpoint is in progress +getJobSpec(sp1SessionJob).setCheckpointTriggerNonce(3L); +reconciler.reconcile(sp1SessionJob, readyContext); +assertEquals("checkpoint_trigger_0", getCheckpointInfo(sp1SessionJob).getTriggerId()); +/* +TODO: this section needs to be reintroduced in case the LAST_STATE optimization gets + added Review Comment: To block upgrades until checkpointing is complete if the LATEST_STATE is enabled https://a1350286.slack.com/archives/C02V70Y3VAQ/p1689850156914949?thread_ts=1689781790.906109&cid=C02V70Y3VAQ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] patricklucas commented on pull request #22987: [FLINK-32583][rest] Fix deadlock in RestClient
patricklucas commented on PR #22987: URL: https://github.com/apache/flink/pull/22987#issuecomment-1651315710 @XComp Made the following changes: - Updated `testCloseClientWhileProcessingRequest` to check that `responseChannelFutures` is empty before and size 1 after the call to `sendRequest` - Added a test that preloads a response channel future and makes sure it's resolved as expected - To accomplish that, added `@VisibleForTesting RestClient#getResponseChannelFutures()` - To keep consistency, I changed one other field which was `@VisibleForTesting`, but non-final and without a getter, hope this is okay - Making the field final should be a strict improvement, but being consistent about adding a getter rather than exposing a bare field also seems good - I also made a symmetric change in `RestServerEndpoint`—the diff should be clear as to why that made sense It occurred to me that there is actually no test in `RestClientTest` that actually tests the happy path. If there were, I would have copied it to test that the response channel future is added and also cleared when the client is not closed, but the request completes successfully. I'm not super motivated to do that right now as I'm very busy and have already put a good bit of time into this change—what do you think? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (FLINK-32683) Update Pekko from 1.0.0 to 1.0.1
Matthew de Detrich created FLINK-32683: -- Summary: Update Pekko from 1.0.0 to 1.0.1 Key: FLINK-32683 URL: https://issues.apache.org/jira/browse/FLINK-32683 Project: Flink Issue Type: Improvement Reporter: Matthew de Detrich Updates Pekko dependency to 1.0.1 which contains the following bugfix [https://github.com/apache/incubator-pekko/pull/492] . See [https://github.com/apache/incubator-pekko/issues/491] for more info -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] afedulov opened a new pull request, #23079: [FLINK-32670] Annotate interfaces that inherit from SourceFunction as deprecated
afedulov opened a new pull request, #23079: URL: https://github.com/apache/flink/pull/23079 This is a trivial change that marks sub-interfaces of SourceFunction as `@Deprecated` and adjusts an example to fix the strict deprecation complier check failure. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-32670) Annotate interfaces that inherit from SourceFunction as deprecated
[ https://issues.apache.org/jira/browse/FLINK-32670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-32670: --- Labels: pull-request-available (was: ) > Annotate interfaces that inherit from SourceFunction as deprecated > --- > > Key: FLINK-32670 > URL: https://issues.apache.org/jira/browse/FLINK-32670 > Project: Flink > Issue Type: Sub-task >Reporter: Alexander Fedulov >Assignee: Alexander Fedulov >Priority: Major > Labels: pull-request-available > > ParallelSourceFunction, RichParallelSourceFunction, ExternallyInducedSource -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] flinkbot commented on pull request #23079: [FLINK-32670] Annotate interfaces that inherit from SourceFunction as deprecated
flinkbot commented on PR #23079: URL: https://github.com/apache/flink/pull/23079#issuecomment-1651344213 ## CI report: * f3a2c5c03093627082969da5e3b46cef4443fd5b UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] mdedetrich opened a new pull request, #23080: [FLINK-32683][rpc] Update Pekko from 1.0.0 to 1.0.1
mdedetrich opened a new pull request, #23080: URL: https://github.com/apache/flink/pull/23080 ## What is the purpose of the change This PR bumps Pekko from v1.0.0 to 1.0.1. The main and hence only change from Pekko 1.0.0 to Pekko 1.0.1 is https://github.com/apache/incubator-pekko/pull/492, see issue at https://github.com/apache/incubator-pekko/issues/491. Just to provide some context, the core issue is that using `$` in any symbol name for user defined classes breaks the JVM spec (initially it was thought that it only broke the Scala spec but this was later clarified in https://github.com/lampepfl/dotty/issues/18234#issuecomment-1639861800 ). While it is highly unlikely that using Pekko 1.0.0 will cause any problems, if there does happen to be an issue theoretically speaking it wouldn't be self contained to only Scala users. ## Brief change log * Update Pekko dependency to 1.0.1 as well as updating `NOTICE` file ## Verifying this change Please make sure both new and modified tests in this PR follows the conventions defined in our code quality guide: https://flink.apache.org/contributing/code-style-and-quality-common.html#testing This change is already covered by existing tests, such as *(please describe tests)*. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): yes - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: no - The serializers: no - The runtime per-record code paths (performance sensitive): no - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no - The S3 file system connector: no ## Documentation - Does this pull request introduce a new feature? no -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-32683) Update Pekko from 1.0.0 to 1.0.1
[ https://issues.apache.org/jira/browse/FLINK-32683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-32683: --- Labels: pull-request-available (was: ) > Update Pekko from 1.0.0 to 1.0.1 > > > Key: FLINK-32683 > URL: https://issues.apache.org/jira/browse/FLINK-32683 > Project: Flink > Issue Type: Improvement >Reporter: Matthew de Detrich >Priority: Major > Labels: pull-request-available > > Updates Pekko dependency to 1.0.1 which contains the following bugfix > [https://github.com/apache/incubator-pekko/pull/492] . See > [https://github.com/apache/incubator-pekko/issues/491] for more info -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-32680) Job vertex names get messed up once there is a source vertex chained with a MultipleInput vertex in job graph
[ https://issues.apache.org/jira/browse/FLINK-32680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17747409#comment-17747409 ] Junrui Li commented on FLINK-32680: --- This bug is because the global ChainedSources are used when generating the JobVertex name ([here|https://github.com/apache/flink/blob/c8ae39d4ac73f81873e1d8ac37e17c29ae330b23/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/graph/StreamingJobGraphGenerator.java#L903]). But in fact, it should be filtered according to the id of the current node. An example can refer to is [here.|https://github.com/apache/flink/blob/c8ae39d4ac73f81873e1d8ac37e17c29ae330b23/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/graph/StreamingJobGraphGenerator.java#L1052] If the idea is correct, I will prepare a PR to fix this issue. > Job vertex names get messed up once there is a source vertex chained with a > MultipleInput vertex in job graph > - > > Key: FLINK-32680 > URL: https://issues.apache.org/jira/browse/FLINK-32680 > Project: Flink > Issue Type: Bug >Affects Versions: 1.16.2, 1.18.0, 1.17.1 >Reporter: Lijie Wang >Priority: Major > Attachments: image-2023-07-26-15-23-29-551.png, > image-2023-07-26-15-24-24-077.png > > > Take the following test(put it to {{MultipleInputITCase}}) as example: > {code:java} > @Test > public void testMultipleInputDoesNotChainedWithSource() throws Exception { > testJobVertexName(false); > } > > @Test > public void testMultipleInputChainedWithSource() throws Exception { > testJobVertexName(true); > } > public void testJobVertexName(boolean chain) throws Exception { > StreamExecutionEnvironment env = > StreamExecutionEnvironment.getExecutionEnvironment(); > env.setParallelism(1); > TestListResultSink resultSink = new TestListResultSink<>(); > DataStream source1 = env.fromSequence(0L, 3L).name("source1"); > DataStream source2 = env.fromElements(4L, 6L).name("source2"); > DataStream source3 = env.fromElements(7L, 9L).name("source3"); > KeyedMultipleInputTransformation transform = > new KeyedMultipleInputTransformation<>( > "MultipleInput", > new KeyedSumMultipleInputOperatorFactory(), > BasicTypeInfo.LONG_TYPE_INFO, > 1, > BasicTypeInfo.LONG_TYPE_INFO); > if (chain) { > transform.setChainingStrategy(ChainingStrategy.HEAD_WITH_SOURCES); > } > KeySelector keySelector = (KeySelector) value > -> value % 3; > env.addOperator( > transform > .addInput(source1.getTransformation(), keySelector) > .addInput(source2.getTransformation(), keySelector) > .addInput(source3.getTransformation(), keySelector)); > new > MultipleConnectedStreams(env).transform(transform).rebalance().addSink(resultSink).name("sink"); > env.execute(); > }{code} > > When we run {{testMultipleInputDoesNotChainedWithSource}} , all job vertex > names are normal: > !image-2023-07-26-15-24-24-077.png|width=494,height=246! > When we run {{testMultipleInputChainedWithSource}} (the MultipleInput chained > with source1), job vertex names get messed up (all job vertex names contain > {{{}Source: source1{}}}): > !image-2023-07-26-15-23-29-551.png|width=515,height=182! > > I think it's a bug. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-32680) Job vertex names get messed up once there is a source vertex chained with a MultipleInput vertex in job graph
[ https://issues.apache.org/jira/browse/FLINK-32680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17747411#comment-17747411 ] Lijie Wang commented on FLINK-32680: Thanks [~JunRuiLi] , assiged to you. > Job vertex names get messed up once there is a source vertex chained with a > MultipleInput vertex in job graph > - > > Key: FLINK-32680 > URL: https://issues.apache.org/jira/browse/FLINK-32680 > Project: Flink > Issue Type: Bug >Affects Versions: 1.16.2, 1.18.0, 1.17.1 >Reporter: Lijie Wang >Priority: Major > Attachments: image-2023-07-26-15-23-29-551.png, > image-2023-07-26-15-24-24-077.png > > > Take the following test(put it to {{MultipleInputITCase}}) as example: > {code:java} > @Test > public void testMultipleInputDoesNotChainedWithSource() throws Exception { > testJobVertexName(false); > } > > @Test > public void testMultipleInputChainedWithSource() throws Exception { > testJobVertexName(true); > } > public void testJobVertexName(boolean chain) throws Exception { > StreamExecutionEnvironment env = > StreamExecutionEnvironment.getExecutionEnvironment(); > env.setParallelism(1); > TestListResultSink resultSink = new TestListResultSink<>(); > DataStream source1 = env.fromSequence(0L, 3L).name("source1"); > DataStream source2 = env.fromElements(4L, 6L).name("source2"); > DataStream source3 = env.fromElements(7L, 9L).name("source3"); > KeyedMultipleInputTransformation transform = > new KeyedMultipleInputTransformation<>( > "MultipleInput", > new KeyedSumMultipleInputOperatorFactory(), > BasicTypeInfo.LONG_TYPE_INFO, > 1, > BasicTypeInfo.LONG_TYPE_INFO); > if (chain) { > transform.setChainingStrategy(ChainingStrategy.HEAD_WITH_SOURCES); > } > KeySelector keySelector = (KeySelector) value > -> value % 3; > env.addOperator( > transform > .addInput(source1.getTransformation(), keySelector) > .addInput(source2.getTransformation(), keySelector) > .addInput(source3.getTransformation(), keySelector)); > new > MultipleConnectedStreams(env).transform(transform).rebalance().addSink(resultSink).name("sink"); > env.execute(); > }{code} > > When we run {{testMultipleInputDoesNotChainedWithSource}} , all job vertex > names are normal: > !image-2023-07-26-15-24-24-077.png|width=494,height=246! > When we run {{testMultipleInputChainedWithSource}} (the MultipleInput chained > with source1), job vertex names get messed up (all job vertex names contain > {{{}Source: source1{}}}): > !image-2023-07-26-15-23-29-551.png|width=515,height=182! > > I think it's a bug. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (FLINK-32680) Job vertex names get messed up once there is a source vertex chained with a MultipleInput vertex in job graph
[ https://issues.apache.org/jira/browse/FLINK-32680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lijie Wang reassigned FLINK-32680: -- Assignee: Junrui Li > Job vertex names get messed up once there is a source vertex chained with a > MultipleInput vertex in job graph > - > > Key: FLINK-32680 > URL: https://issues.apache.org/jira/browse/FLINK-32680 > Project: Flink > Issue Type: Bug >Affects Versions: 1.16.2, 1.18.0, 1.17.1 >Reporter: Lijie Wang >Assignee: Junrui Li >Priority: Major > Attachments: image-2023-07-26-15-23-29-551.png, > image-2023-07-26-15-24-24-077.png > > > Take the following test(put it to {{MultipleInputITCase}}) as example: > {code:java} > @Test > public void testMultipleInputDoesNotChainedWithSource() throws Exception { > testJobVertexName(false); > } > > @Test > public void testMultipleInputChainedWithSource() throws Exception { > testJobVertexName(true); > } > public void testJobVertexName(boolean chain) throws Exception { > StreamExecutionEnvironment env = > StreamExecutionEnvironment.getExecutionEnvironment(); > env.setParallelism(1); > TestListResultSink resultSink = new TestListResultSink<>(); > DataStream source1 = env.fromSequence(0L, 3L).name("source1"); > DataStream source2 = env.fromElements(4L, 6L).name("source2"); > DataStream source3 = env.fromElements(7L, 9L).name("source3"); > KeyedMultipleInputTransformation transform = > new KeyedMultipleInputTransformation<>( > "MultipleInput", > new KeyedSumMultipleInputOperatorFactory(), > BasicTypeInfo.LONG_TYPE_INFO, > 1, > BasicTypeInfo.LONG_TYPE_INFO); > if (chain) { > transform.setChainingStrategy(ChainingStrategy.HEAD_WITH_SOURCES); > } > KeySelector keySelector = (KeySelector) value > -> value % 3; > env.addOperator( > transform > .addInput(source1.getTransformation(), keySelector) > .addInput(source2.getTransformation(), keySelector) > .addInput(source3.getTransformation(), keySelector)); > new > MultipleConnectedStreams(env).transform(transform).rebalance().addSink(resultSink).name("sink"); > env.execute(); > }{code} > > When we run {{testMultipleInputDoesNotChainedWithSource}} , all job vertex > names are normal: > !image-2023-07-26-15-24-24-077.png|width=494,height=246! > When we run {{testMultipleInputChainedWithSource}} (the MultipleInput chained > with source1), job vertex names get messed up (all job vertex names contain > {{{}Source: source1{}}}): > !image-2023-07-26-15-23-29-551.png|width=515,height=182! > > I think it's a bug. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] flinkbot commented on pull request #23080: [FLINK-32683][rpc] Update Pekko from 1.0.0 to 1.0.1
flinkbot commented on PR #23080: URL: https://github.com/apache/flink/pull/23080#issuecomment-1651353523 ## CI report: * 8e2e4705345c21e24c40c882cc24d74d7f094be6 UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-32653) Add doc for catalog store
[ https://issues.apache.org/jira/browse/FLINK-32653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17747425#comment-17747425 ] Feng Jin commented on FLINK-32653: -- [~Leonard] please assign this task to me when you are free. Thanks. > Add doc for catalog store > - > > Key: FLINK-32653 > URL: https://issues.apache.org/jira/browse/FLINK-32653 > Project: Flink > Issue Type: Sub-task > Components: Documentation >Reporter: Feng Jin >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-32270) Heartbeat timeout in AggregateReduceGroupingITCase.testAggOnLeftJoin on AZP
[ https://issues.apache.org/jira/browse/FLINK-32270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17747427#comment-17747427 ] Qingsheng Ren commented on FLINK-32270: --- Downgraded to Major as only it only appeared once in two months > Heartbeat timeout in AggregateReduceGroupingITCase.testAggOnLeftJoin on AZP > --- > > Key: FLINK-32270 > URL: https://issues.apache.org/jira/browse/FLINK-32270 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination, Table SQL / Planner >Affects Versions: 1.18.0 >Reporter: Sergey Nuyanzin >Assignee: Roman Khachatryan >Priority: Critical > Labels: test-stability > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=49590&view=logs&j=a9db68b9-a7e0-54b6-0f98-010e0aff39e2&t=cdd32e0b-6047-565b-c58f-14054472f1be&l=12613 > {noformat} > Jun 03 02:28:48 ... 4 more > Jun 03 02:28:48 Caused by: java.util.concurrent.TimeoutException: Heartbeat > of TaskManager with id c4f0f11f-a01f-4627-88f0-7cf92bc9994b timed out. > Jun 03 02:28:48 ... 30 more > Jun 03 02:28:48 > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-32270) Heartbeat timeout in AggregateReduceGroupingITCase.testAggOnLeftJoin on AZP
[ https://issues.apache.org/jira/browse/FLINK-32270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qingsheng Ren updated FLINK-32270: -- Priority: Major (was: Critical) > Heartbeat timeout in AggregateReduceGroupingITCase.testAggOnLeftJoin on AZP > --- > > Key: FLINK-32270 > URL: https://issues.apache.org/jira/browse/FLINK-32270 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination, Table SQL / Planner >Affects Versions: 1.18.0 >Reporter: Sergey Nuyanzin >Assignee: Roman Khachatryan >Priority: Major > Labels: test-stability > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=49590&view=logs&j=a9db68b9-a7e0-54b6-0f98-010e0aff39e2&t=cdd32e0b-6047-565b-c58f-14054472f1be&l=12613 > {noformat} > Jun 03 02:28:48 ... 4 more > Jun 03 02:28:48 Caused by: java.util.concurrent.TimeoutException: Heartbeat > of TaskManager with id c4f0f11f-a01f-4627-88f0-7cf92bc9994b timed out. > Jun 03 02:28:48 ... 30 more > Jun 03 02:28:48 > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-32683) Update Pekko from 1.0.0 to 1.0.1
[ https://issues.apache.org/jira/browse/FLINK-32683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl updated FLINK-32683: -- Affects Version/s: 1.18.0 > Update Pekko from 1.0.0 to 1.0.1 > > > Key: FLINK-32683 > URL: https://issues.apache.org/jira/browse/FLINK-32683 > Project: Flink > Issue Type: Improvement >Affects Versions: 1.18.0 >Reporter: Matthew de Detrich >Priority: Major > Labels: pull-request-available > > Updates Pekko dependency to 1.0.1 which contains the following bugfix > [https://github.com/apache/incubator-pekko/pull/492] . See > [https://github.com/apache/incubator-pekko/issues/491] for more info -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-32683) Update Pekko from 1.0.0 to 1.0.1
[ https://issues.apache.org/jira/browse/FLINK-32683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl updated FLINK-32683: -- Component/s: Runtime / Coordination > Update Pekko from 1.0.0 to 1.0.1 > > > Key: FLINK-32683 > URL: https://issues.apache.org/jira/browse/FLINK-32683 > Project: Flink > Issue Type: Improvement > Components: Runtime / Coordination >Affects Versions: 1.18.0 >Reporter: Matthew de Detrich >Priority: Major > Labels: pull-request-available > > Updates Pekko dependency to 1.0.1 which contains the following bugfix > [https://github.com/apache/incubator-pekko/pull/492] . See > [https://github.com/apache/incubator-pekko/issues/491] for more info -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-26541) SQL Client should support submitting SQL jobs in application mode
[ https://issues.apache.org/jira/browse/FLINK-26541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17747428#comment-17747428 ] shizhengchao commented on FLINK-26541: -- Why do not use session mode?If application mode supports sql, then I think there is no different whith session mode. > SQL Client should support submitting SQL jobs in application mode > - > > Key: FLINK-26541 > URL: https://issues.apache.org/jira/browse/FLINK-26541 > Project: Flink > Issue Type: Improvement > Components: Deployment / Kubernetes, Deployment / YARN, Table SQL / > Client >Reporter: Jark Wu >Priority: Major > > Currently, the SQL Client only supports submitting jobs in session mode and > per-job mode. As the community going to drop the per-job mode (FLINK-26000), > SQL Client should support application mode as well. Otherwise, SQL Client can > only submit SQL in session mode then, but streaming jobs should be submitted > in per-job or application mode to have bettter resource isolation. > Disucssions: https://lists.apache.org/thread/2yq351nb721x23rz1q8qlyf2tqrk147r -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-32604) PyFlink end-to-end fails with kafka-server-stop.sh: No such file or directory
[ https://issues.apache.org/jira/browse/FLINK-32604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17747430#comment-17747430 ] Huang Xingbo commented on FLINK-32604: -- It is a downloading error. We can keep observing the frequency of this failed case. > PyFlink end-to-end fails with kafka-server-stop.sh: No such file or > directory > --- > > Key: FLINK-32604 > URL: https://issues.apache.org/jira/browse/FLINK-32604 > Project: Flink > Issue Type: Bug > Components: API / Python >Affects Versions: 1.18.0 >Reporter: Sergey Nuyanzin >Priority: Critical > Labels: test-stability > > This build > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=51253&view=logs&j=af184cdd-c6d8-5084-0b69-7e9c67b35f7a&t=0f3adb59-eefa-51c6-2858-3654d9e0749d&l=7883 > fails as > {noformat} > /home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/kafka-common.sh: line > 117: > /home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-27379214502/kafka_2.12-3.2.3/bin/kafka-server-stop.sh: > No such file or directory > /home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/kafka-common.sh: line > 121: > /home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-27379214502/kafka_2.12-3.2.3/bin/zookeeper-server-stop.sh: > No such file or directory > Jul 13 19:43:07 [FAIL] Test script contains errors. > Jul 13 19:43:07 Checking of logs skipped. > Jul 13 19:43:07 > Jul 13 19:43:07 [FAIL] 'PyFlink end-to-end test' failed after 0 minutes and > 40 seconds! Test exited with exit code 1 > Jul 13 19:43:07 > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (FLINK-32653) Add doc for catalog store
[ https://issues.apache.org/jira/browse/FLINK-32653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Leonard Xu reassigned FLINK-32653: -- Assignee: Feng Jin > Add doc for catalog store > - > > Key: FLINK-32653 > URL: https://issues.apache.org/jira/browse/FLINK-32653 > Project: Flink > Issue Type: Sub-task > Components: Documentation >Reporter: Feng Jin >Assignee: Feng Jin >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (FLINK-32683) Update Pekko from 1.0.0 to 1.0.1
[ https://issues.apache.org/jira/browse/FLINK-32683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl reassigned FLINK-32683: - Assignee: Matthew de Detrich > Update Pekko from 1.0.0 to 1.0.1 > > > Key: FLINK-32683 > URL: https://issues.apache.org/jira/browse/FLINK-32683 > Project: Flink > Issue Type: Improvement > Components: Runtime / Coordination >Affects Versions: 1.18.0 >Reporter: Matthew de Detrich >Assignee: Matthew de Detrich >Priority: Major > Labels: pull-request-available > > Updates Pekko dependency to 1.0.1 which contains the following bugfix > [https://github.com/apache/incubator-pekko/pull/492] . See > [https://github.com/apache/incubator-pekko/issues/491] for more info -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] flinkbot commented on pull request #23081: [BP-1.17][FLINK-32628] Fix build_wheels_on_macos fails on AZP
flinkbot commented on PR #23081: URL: https://github.com/apache/flink/pull/23081#issuecomment-1651494255 ## CI report: * 2e3b4d260f097087f6bfb32568db15388af8898b UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-32668) fix up watchdog timeout error msg in common.sh(e2e test)
[ https://issues.apache.org/jira/browse/FLINK-32668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17747441#comment-17747441 ] Matthias Pohl commented on FLINK-32668: --- As far as I see it, this failure can only occur if the watchdog thread exited on its own or the watchdog process was exited by some other process (which would be problematic if the corresponding test is still operating). The latter case is the problematic one because we might miss killing the test. We should throw an error in that case. We don't have to worry if both processes are already killed (in this situation, maybe even a warning is not needed but rather a informal output). WDYT? > fix up watchdog timeout error msg in common.sh(e2e test) > -- > > Key: FLINK-32668 > URL: https://issues.apache.org/jira/browse/FLINK-32668 > Project: Flink > Issue Type: Bug > Components: Build System / CI >Affects Versions: 1.16.2, 1.18.0, 1.17.1 >Reporter: Hongshun Wang >Assignee: Hongshun Wang >Priority: Minor > Attachments: image-2023-07-25-15-27-37-441.png > > > When run e2e test, an error like this occrurs: > !image-2023-07-25-15-27-37-441.png|width=733,height=115! > > The corresponding code: > {code:java} > kill_test_watchdog() { > local watchdog_pid=$(cat $TEST_DATA_DIR/job_watchdog.pid) > echo "Stopping job timeout watchdog (with pid=$watchdog_pid)" > kill $watchdog_pid > } > internal_run_with_timeout() { > local timeout_in_seconds="$1" > local on_failure="$2" > local command_label="$3" > local command="${@:4}" > on_exit kill_test_watchdog > ( > command_pid=$BASHPID > (sleep "${timeout_in_seconds}" # set a timeout for this command > echo "${command_label:-"The command '${command}'"} (pid: > $command_pid) did not finish after $timeout_in_seconds seconds." > eval "${on_failure}" > kill "$command_pid") & watchdog_pid=$! > echo $watchdog_pid > $TEST_DATA_DIR/job_watchdog.pid > # invoke > $command > ) > }{code} > > When {{$command}} completes before the timeout, the watchdog process is > killed successfully. However, when {{$command}} times out, the watchdog > process kills {{$command}} and then exits itself, leaving behind an error > message when trying to kill its own process ID with {{{}kill > $watchdog_pid{}}}.This error msg "no such process" is hard to understand. > > So, I will modify like this with better error message: > > {code:java} > kill_test_watchdog() { > local watchdog_pid=$(cat $TEST_DATA_DIR/job_watchdog.pid) > if kill -0 $watchdog_pid > /dev/null 2>&1; then > echo "Stopping job timeout watchdog (with pid=$watchdog_pid)" > kill $watchdog_pid > else > echo "[ERROR] Test is timeout" > exit 1 > fi > } {code} > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] snuyanzin commented on pull request #23045: [FLINK-32628][tests][ci] Fix build_wheels_on_macos fails on AZP
snuyanzin commented on PR #23045: URL: https://github.com/apache/flink/pull/23045#issuecomment-1651505210 Thanks for having a look yes, sure, backports are created once they pass ci will merge them -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] luoyuxia commented on a diff in pull request #23075: [FLINK-32674][doc] Add documentation for the new getTargetColumns in DynamicTableSink
luoyuxia commented on code in PR #23075: URL: https://github.com/apache/flink/pull/23075#discussion_r1274747812 ## docs/content/docs/dev/table/sql/insert.md: ## @@ -208,7 +208,9 @@ column_list: **COLUMN LIST** Given a table T(a INT, b INT, c INT), Flink supports INSERT INTO T(c, b) SELECT x, y FROM S. The expectation is -that 'x' is written to column 'c' and 'y' is written to column 'b' and 'a' is set to NULL (assuming column 'a' is nullable). +that 'x' is written to column 'c' and 'y' is written to column 'b' and 'a' is set to NULL (assuming column 'a' is nullable). +For the connector developers who want to avoid overwriting non-target columns with null values when processing partial column updates, +this user specified target column list can be found from the {{< gh_link file="flink-table/flink-table-common/src/main/java/org/apache/flink/table/connector/sink/DynamicTableSink.java" name="DynamicTableSink$Context.getTargetColumns()" >}}. Review Comment: nit ```suggestion this user-specified target column list can be found from the {{< gh_link file="flink-table/flink-table-common/src/main/java/org/apache/flink/table/connector/sink/DynamicTableSink.java" name="DynamicTableSink$Context.getTargetColumns()" >}}. ``` ## docs/content/docs/dev/table/sql/insert.md: ## @@ -208,7 +208,9 @@ column_list: **COLUMN LIST** Given a table T(a INT, b INT, c INT), Flink supports INSERT INTO T(c, b) SELECT x, y FROM S. The expectation is -that 'x' is written to column 'c' and 'y' is written to column 'b' and 'a' is set to NULL (assuming column 'a' is nullable). +that 'x' is written to column 'c' and 'y' is written to column 'b' and 'a' is set to NULL (assuming column 'a' is nullable). +For the connector developers who want to avoid overwriting non-target columns with null values when processing partial column updates, Review Comment: Also, I'm wondering whether it's more clear to break the line in here since these lines are more likely to say other things. ## docs/content/docs/dev/table/sql/insert.md: ## @@ -208,7 +208,9 @@ column_list: **COLUMN LIST** Given a table T(a INT, b INT, c INT), Flink supports INSERT INTO T(c, b) SELECT x, y FROM S. The expectation is -that 'x' is written to column 'c' and 'y' is written to column 'b' and 'a' is set to NULL (assuming column 'a' is nullable). +that 'x' is written to column 'c' and 'y' is written to column 'b' and 'a' is set to NULL (assuming column 'a' is nullable). +For the connector developers who want to avoid overwriting non-target columns with null values when processing partial column updates, Review Comment: It seems like we miss how the connector devs can avoid it? may be some like: ``` For the connector when processing partial column updates, they can find the user-specified target column list and only update the target columns. ``` ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot commented on pull request #23082: [BP-1.16][FLINK-32628] Fix build_wheels_on_macos fails on AZP
flinkbot commented on PR #23082: URL: https://github.com/apache/flink/pull/23082#issuecomment-1651513201 ## CI report: * 58f22d778efa8ea4170919ba61ffe3f234097bf6 UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink-kubernetes-operator] afedulov commented on a diff in pull request #637: [FLINK-29634] Support periodic checkpoint triggering
afedulov commented on code in PR #637: URL: https://github.com/apache/flink-kubernetes-operator/pull/637#discussion_r1274823122 ## flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/service/AbstractFlinkService.java: ## @@ -594,6 +638,58 @@ public SavepointFetchResult fetchSavepointInfo( } } +@Override +public CheckpointFetchResult fetchCheckpointInfo( Review Comment: Oh, missed that one. Added. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] HuangXingBo closed pull request #23045: [FLINK-32628][tests][ci] Fix build_wheels_on_macos fails on AZP
HuangXingBo closed pull request #23045: [FLINK-32628][tests][ci] Fix build_wheels_on_macos fails on AZP URL: https://github.com/apache/flink/pull/23045 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink-kubernetes-operator] afedulov commented on a diff in pull request #637: [FLINK-29634] Support periodic checkpoint triggering
afedulov commented on code in PR #637: URL: https://github.com/apache/flink-kubernetes-operator/pull/637#discussion_r1274827242 ## flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/observer/SnapshotObserver.java: ## @@ -74,8 +80,32 @@ public void observeSavepointStatus(FlinkResourceContext ctx) { cleanupSavepointHistory(ctx, savepointInfo); } +public void observeCheckpointStatus(FlinkResourceContext ctx) { +if (!isCheckpointsTriggeringSupported(ctx.getObserveConfig())) { +return; +} +var resource = ctx.getResource(); +var jobStatus = resource.getStatus().getJobStatus(); +var checkpointInfo = jobStatus.getSavepointInfo(); +var jobId = jobStatus.getJobId(); + +// If any manual or periodic savepoint is in progress, observe it +if (SnapshotUtils.checkpointInProgress(jobStatus)) { +observeTriggeredCheckpoint(ctx, jobId); +} + +// REVIEW: clarify if this is relevant for checkpoints. +/* +// If job is in globally terminal state, observe last savepoint +if (ReconciliationUtils.isJobInTerminalState(resource.getStatus())) { +observeLatestCheckpoint( +ctx.getFlinkService(), checkpointInfo, jobId, ctx.getObserveConfig()); +} +*/ Review Comment: Thanks, I removed that bit. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink-kubernetes-operator] afedulov commented on pull request #637: [FLINK-29634] Support periodic checkpoint triggering
afedulov commented on PR #637: URL: https://github.com/apache/flink-kubernetes-operator/pull/637#issuecomment-1651622614 @mxm @gyfora thanks for the review. I addressed the comments. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (FLINK-32684) Renaming AkkaOptions into PekkoOptions
Matthias Pohl created FLINK-32684: - Summary: Renaming AkkaOptions into PekkoOptions Key: FLINK-32684 URL: https://issues.apache.org/jira/browse/FLINK-32684 Project: Flink Issue Type: Technical Debt Components: Runtime / Coordination Affects Versions: 1.18.0 Reporter: Matthias Pohl Fix For: 2.0.0 FLINK-32468 introduced Apache Pekko as an replacement for Akka. This involved renaming classes (besides updating comments). {{AkkaOptions}} was the only occurrence that wasn't renamed as it's annotated as {{@PublicEvolving}}. This issue is about renaming {{AkkaOptions}} into {{PekkoOptions}}. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-32685) Benchmark regression on sortedMultiInput and sortedTwoInput since 2023-07-18
Martijn Visser created FLINK-32685: -- Summary: Benchmark regression on sortedMultiInput and sortedTwoInput since 2023-07-18 Key: FLINK-32685 URL: https://issues.apache.org/jira/browse/FLINK-32685 Project: Flink Issue Type: Bug Affects Versions: 1.18.0 Reporter: Martijn Visser http://codespeed.dak8s.net:8000/timeline/#/?exe=1&ben=sortedMultiInput&extr=on&quarts=on&equid=off&env=2&revs=200 http://codespeed.dak8s.net:8000/timeline/#/?exe=1&ben=sortedTwoInput&extr=on&quarts=on&equid=off&env=2&revs=200 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-32686) Benchmark regression on startScheduling.BATCH and startScheduling.STREAMING since 2023-07-24
Martijn Visser created FLINK-32686: -- Summary: Benchmark regression on startScheduling.BATCH and startScheduling.STREAMING since 2023-07-24 Key: FLINK-32686 URL: https://issues.apache.org/jira/browse/FLINK-32686 Project: Flink Issue Type: Bug Affects Versions: 1.18.0 Reporter: Martijn Visser http://codespeed.dak8s.net:8000/timeline/#/?exe=5&ben=startScheduling.STREAMING&extr=on&quarts=on&equid=off&env=2&revs=200 http://codespeed.dak8s.net:8000/timeline/#/?exe=5&ben=startScheduling.BATCH&extr=on&quarts=on&equid=off&env=2&revs=200 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-32686) Performance regression on startScheduling.BATCH and startScheduling.STREAMING since 2023-07-24
[ https://issues.apache.org/jira/browse/FLINK-32686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Martijn Visser updated FLINK-32686: --- Summary: Performance regression on startScheduling.BATCH and startScheduling.STREAMING since 2023-07-24 (was: Benchmark regression on startScheduling.BATCH and startScheduling.STREAMING since 2023-07-24 ) > Performance regression on startScheduling.BATCH and startScheduling.STREAMING > since 2023-07-24 > --- > > Key: FLINK-32686 > URL: https://issues.apache.org/jira/browse/FLINK-32686 > Project: Flink > Issue Type: Bug >Affects Versions: 1.18.0 >Reporter: Martijn Visser >Priority: Blocker > > http://codespeed.dak8s.net:8000/timeline/#/?exe=5&ben=startScheduling.STREAMING&extr=on&quarts=on&equid=off&env=2&revs=200 > http://codespeed.dak8s.net:8000/timeline/#/?exe=5&ben=startScheduling.BATCH&extr=on&quarts=on&equid=off&env=2&revs=200 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-32685) Performance regression on sortedMultiInput and sortedTwoInput since 2023-07-18
[ https://issues.apache.org/jira/browse/FLINK-32685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Martijn Visser updated FLINK-32685: --- Summary: Performance regression on sortedMultiInput and sortedTwoInput since 2023-07-18 (was: Benchmark regression on sortedMultiInput and sortedTwoInput since 2023-07-18) > Performance regression on sortedMultiInput and sortedTwoInput since 2023-07-18 > -- > > Key: FLINK-32685 > URL: https://issues.apache.org/jira/browse/FLINK-32685 > Project: Flink > Issue Type: Bug >Affects Versions: 1.18.0 >Reporter: Martijn Visser >Priority: Blocker > > http://codespeed.dak8s.net:8000/timeline/#/?exe=1&ben=sortedMultiInput&extr=on&quarts=on&equid=off&env=2&revs=200 > http://codespeed.dak8s.net:8000/timeline/#/?exe=1&ben=sortedTwoInput&extr=on&quarts=on&equid=off&env=2&revs=200 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-32687) Performance regression on handleGlobalFailureAndRestartAllTasks.BATCH_EVENLY since 2023-07-23
Martijn Visser created FLINK-32687: -- Summary: Performance regression on handleGlobalFailureAndRestartAllTasks.BATCH_EVENLY since 2023-07-23 Key: FLINK-32687 URL: https://issues.apache.org/jira/browse/FLINK-32687 Project: Flink Issue Type: Bug Affects Versions: 1.18.0 Reporter: Martijn Visser http://codespeed.dak8s.net:8000/timeline/#/?exe=5&ben=handleGlobalFailureAndRestartAllTasks.BATCH_EVENLY&extr=on&quarts=on&equid=off&env=2&revs=200 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-32688) Remove deprecated exception history fields
Matthias Pohl created FLINK-32688: - Summary: Remove deprecated exception history fields Key: FLINK-32688 URL: https://issues.apache.org/jira/browse/FLINK-32688 Project: Flink Issue Type: Sub-task Components: Runtime / REST Affects Versions: 1.18.0 Reporter: Matthias Pohl Fix For: 2.0.0 The fields were already marked as deprecated (see [JobExceptionInfo|https://github.com/apache/flink/blob/a49f1aaec6239401cc8b1dac731d290e95290caf/flink-runtime/src/main/java/org/apache/flink/runtime/rest/messages/JobExceptionsInfo.java#L35]) but were not discussed as part of a FLIP. Working on this issue would require creating a FLIP to cover the REST API change. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] lincoln-lil commented on a diff in pull request #23075: [FLINK-32674][doc] Add documentation for the new getTargetColumns in DynamicTableSink
lincoln-lil commented on code in PR #23075: URL: https://github.com/apache/flink/pull/23075#discussion_r1274930002 ## docs/content/docs/dev/table/sql/insert.md: ## @@ -208,7 +208,9 @@ column_list: **COLUMN LIST** Given a table T(a INT, b INT, c INT), Flink supports INSERT INTO T(c, b) SELECT x, y FROM S. The expectation is -that 'x' is written to column 'c' and 'y' is written to column 'b' and 'a' is set to NULL (assuming column 'a' is nullable). +that 'x' is written to column 'c' and 'y' is written to column 'b' and 'a' is set to NULL (assuming column 'a' is nullable). +For the connector developers who want to avoid overwriting non-target columns with null values when processing partial column updates, +this user specified target column list can be found from the {{< gh_link file="flink-table/flink-table-common/src/main/java/org/apache/flink/table/connector/sink/DynamicTableSink.java" name="DynamicTableSink$Context.getTargetColumns()" >}}. Review Comment: if the above new version is ok, then this dash will not be added ## docs/content/docs/dev/table/sql/insert.md: ## @@ -208,7 +208,9 @@ column_list: **COLUMN LIST** Given a table T(a INT, b INT, c INT), Flink supports INSERT INTO T(c, b) SELECT x, y FROM S. The expectation is -that 'x' is written to column 'c' and 'y' is written to column 'b' and 'a' is set to NULL (assuming column 'a' is nullable). +that 'x' is written to column 'c' and 'y' is written to column 'b' and 'a' is set to NULL (assuming column 'a' is nullable). +For the connector developers who want to avoid overwriting non-target columns with null values when processing partial column updates, Review Comment: How about: "For connector developers who want to avoid overwriting non-target columns with null values when processing partial column updates, you can get the information about the target columns specified by the user's insert statement from 'getTargetColumns(link...)' and decide how to process the partial updates." ## docs/content/docs/dev/table/sql/insert.md: ## @@ -208,7 +208,9 @@ column_list: **COLUMN LIST** Given a table T(a INT, b INT, c INT), Flink supports INSERT INTO T(c, b) SELECT x, y FROM S. The expectation is -that 'x' is written to column 'c' and 'y' is written to column 'b' and 'a' is set to NULL (assuming column 'a' is nullable). +that 'x' is written to column 'c' and 'y' is written to column 'b' and 'a' is set to NULL (assuming column 'a' is nullable). +For the connector developers who want to avoid overwriting non-target columns with null values when processing partial column updates, Review Comment: current version:  a new line seems to be better, will update it. ## docs/content/docs/dev/table/sql/insert.md: ## @@ -208,7 +208,9 @@ column_list: **COLUMN LIST** Given a table T(a INT, b INT, c INT), Flink supports INSERT INTO T(c, b) SELECT x, y FROM S. The expectation is -that 'x' is written to column 'c' and 'y' is written to column 'b' and 'a' is set to NULL (assuming column 'a' is nullable). +that 'x' is written to column 'c' and 'y' is written to column 'b' and 'a' is set to NULL (assuming column 'a' is nullable). +For the connector developers who want to avoid overwriting non-target columns with null values when processing partial column updates, Review Comment: The corresponding chinese version: 连接器开发人员在处理部分列更新时,如果希望避免用空值覆盖非目标列,可以从 "getTargetColumns(link...) "中获取用户插入语句指定的目标列信息,然后决定如何处理部分更新。 ## docs/content/docs/dev/table/sql/insert.md: ## @@ -208,7 +208,9 @@ column_list: **COLUMN LIST** Given a table T(a INT, b INT, c INT), Flink supports INSERT INTO T(c, b) SELECT x, y FROM S. The expectation is -that 'x' is written to column 'c' and 'y' is written to column 'b' and 'a' is set to NULL (assuming column 'a' is nullable). +that 'x' is written to column 'c' and 'y' is written to column 'b' and 'a' is set to NULL (assuming column 'a' is nullable). +For the connector developers who want to avoid overwriting non-target columns with null values when processing partial column updates, Review Comment: The corresponding chinese version: 连接器开发人员在处理部分列更新时,如果希望避免用空值覆盖非目标列,可以从 "getTargetColumns(link...) "中获取用户插入语句指定的目标列信息,然后决定如何处理部分更新。 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-32645) Flink pulsar sink is having poor performance
[ https://issues.apache.org/jira/browse/FLINK-32645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zili Chen updated FLINK-32645: -- Fix Version/s: pulsar-3.0.2 > Flink pulsar sink is having poor performance > > > Key: FLINK-32645 > URL: https://issues.apache.org/jira/browse/FLINK-32645 > Project: Flink > Issue Type: Bug > Components: Connectors / Pulsar >Affects Versions: 1.16.2 > Environment: !Screenshot 2023-07-22 at 1.59.42 PM.png!!Screenshot > 2023-07-22 at 2.03.53 PM.png! > >Reporter: Vijaya Bhaskar V >Assignee: Zili Chen >Priority: Major > Fix For: pulsar-3.0.2 > > Attachments: Screenshot 2023-07-22 at 2.03.53 PM.png, Screenshot > 2023-07-22 at 2.56.55 PM.png, Screenshot 2023-07-22 at 3.45.21 PM-1.png, > Screenshot 2023-07-22 at 3.45.21 PM.png, pom.xml > > > Found following issue with flink pulsar sink: > > Flink pulsar sink is always waiting while enqueueing the message and making > the task slot busy no matter how many free slots we provide. Attached the > screen shot of the same > Just sending messages of less rate 8k msg/sec and stand alone flink job with > discarding sink is able to receive full rate if 8K msg/sec > Where as pulsar sink was consuming only upto 2K msg/sec and the sink is > always busy waiting. Snapshot of thread dump attached. > Also snap shot of flink stream graph attached > > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] XComp commented on a diff in pull request #22987: [FLINK-32583][rest] Fix deadlock in RestClient
XComp commented on code in PR #22987: URL: https://github.com/apache/flink/pull/22987#discussion_r1274946566 ## flink-runtime/src/test/java/org/apache/flink/runtime/rest/RestClientTest.java: ## @@ -207,6 +218,120 @@ public void testRestClientClosedHandling() throws Exception { } } +/** + * Tests that the futures returned by {@link RestClient} fail immediately if the client is + * already closed. + * + * See FLINK-32583 + */ +@Test +public void testCloseClientBeforeRequest() throws Exception { +try (final RestClient restClient = +new RestClient(new Configuration(), Executors.directExecutor())) { +restClient.close(); // Intentionally close the client prior to the request + +CompletableFuture future = +restClient.sendRequest( +unroutableIp, +80, +new TestMessageHeaders(), +EmptyMessageParameters.getInstance(), +EmptyRequestBody.getInstance()); + +// Call get() on the future with a timeout of 0s so we can test that the exception +// thrown is not a TimeoutException, which is what would be thrown if restClient were +// not already closed +final ThrowingRunnable getFuture = () -> future.get(0, TimeUnit.SECONDS); + +final Throwable cause = assertThrows(ExecutionException.class, getFuture).getCause(); +assertThat(cause, instanceOf(IllegalStateException.class)); +assertThat(cause.getMessage(), equalTo("RestClient is already closed")); +} +} + +@Test +public void testCloseClientWhileProcessingRequest() throws Exception { +// Set up a Netty SelectStrategy with latches that allow us to step forward through Netty's +// request state machine, closing the client at a particular moment +final OneShotLatch connectTriggered = new OneShotLatch(); +final OneShotLatch closeTriggered = new OneShotLatch(); +final SelectStrategy fallbackSelectStrategy = +DefaultSelectStrategyFactory.INSTANCE.newSelectStrategy(); +final SelectStrategyFactory selectStrategyFactory = +() -> +(selectSupplier, hasTasks) -> { +connectTriggered.trigger(); +closeTriggered.awaitQuietly(1, TimeUnit.SECONDS); Review Comment: ```suggestion closeTriggered.awaitQuietly(); ``` The community agreed on not using timeouts in tests (see [Flink Coding Conventions](https://flink.apache.org/how-to-contribute/code-style-and-quality-common/#avoid-timeouts-in-junit-tests)) since they are a source of instabilities. Instead the test would run until the overall CI build is cancelled after ~4hrs. ## flink-runtime/src/test/java/org/apache/flink/runtime/rest/RestClientTest.java: ## @@ -207,6 +218,120 @@ public void testRestClientClosedHandling() throws Exception { } } +/** + * Tests that the futures returned by {@link RestClient} fail immediately if the client is + * already closed. + * + * See FLINK-32583 + */ +@Test +public void testCloseClientBeforeRequest() throws Exception { +try (final RestClient restClient = +new RestClient(new Configuration(), Executors.directExecutor())) { +restClient.close(); // Intentionally close the client prior to the request + +CompletableFuture future = +restClient.sendRequest( +unroutableIp, +80, +new TestMessageHeaders(), +EmptyMessageParameters.getInstance(), +EmptyRequestBody.getInstance()); + +// Call get() on the future with a timeout of 0s so we can test that the exception +// thrown is not a TimeoutException, which is what would be thrown if restClient were +// not already closed +final ThrowingRunnable getFuture = () -> future.get(0, TimeUnit.SECONDS); + +final Throwable cause = assertThrows(ExecutionException.class, getFuture).getCause(); +assertThat(cause, instanceOf(IllegalStateException.class)); +assertThat(cause.getMessage(), equalTo("RestClient is already closed")); +} +} + +@Test +public void testCloseClientWhileProcessingRequest() throws Exception { +// Set up a Netty SelectStrategy with latches that allow us to step forward through Netty's +// request state machine, closing the client at a particular moment +final OneShotLatch connectTriggered = new OneShotLatch(); +final OneShotLatch closeTriggered = new OneShotLatch(); +f
[jira] [Assigned] (FLINK-28050) Introduce Source API alternative to SourceExecutionContext#fromCollection(*) methods
[ https://issues.apache.org/jira/browse/FLINK-28050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Fedulov reassigned FLINK-28050: - Assignee: Alexander Fedulov > Introduce Source API alternative to SourceExecutionContext#fromCollection(*) > methods > > > Key: FLINK-28050 > URL: https://issues.apache.org/jira/browse/FLINK-28050 > Project: Flink > Issue Type: Sub-task > Components: API / DataStream >Reporter: Alexander Fedulov >Assignee: Alexander Fedulov >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink-kubernetes-operator] gyfora merged pull request #638: Update doc to ensure that users keep in mind how rollback feature works
gyfora merged PR #638: URL: https://github.com/apache/flink-kubernetes-operator/pull/638 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-29281) Replace Akka by gRPC-based RPC implementation
[ https://issues.apache.org/jira/browse/FLINK-29281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17747530#comment-17747530 ] Ferenc Csaky commented on FLINK-29281: -- [~chesnay] is this annulates the Akka Artery migration (FLINK-28372) at this point? Now that the migration to Pekko is done, which is an Akka 2.6 fork, which does not change the semantics of the Artery migration I guess, but if this is in reach, putting more effort into Artery and update/complete the existing draft may not worth it. > Replace Akka by gRPC-based RPC implementation > - > > Key: FLINK-29281 > URL: https://issues.apache.org/jira/browse/FLINK-29281 > Project: Flink > Issue Type: Technical Debt > Components: Runtime / RPC >Reporter: Chesnay Schepler >Assignee: Chesnay Schepler >Priority: Major > > Following the license change I propose to eventually replace Akka. > Based on LEGAL-619 an exemption is not feasible, and while a fork _may_ be > created it's long-term future is up in the air and I'd be uncomfortable with > relying on it. > I've been experimenting with a new RPC implementation based on gRPC and so > far I'm quite optimistic. It's also based on Netty while not requiring as > much of a tight coupling as Akka did. > This would also allow us to sidestep migrating our current Akka setup from > Netty 3 (which is affected by several CVEs) to Akka Artery, both saving work > and not introducing an entirely different network stack to the project. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] dawidwys opened a new pull request, #23083: [FLINK-32682] Make it possible to use query time based time functions in streaming mode
dawidwys opened a new pull request, #23083: URL: https://github.com/apache/flink/pull/23083 ## What is the purpose of the change This PR introduces the option as discussed in FLIP-162 future work section to choose how time based functions should be evaluated. ## Verifying this change Added test in org.apache.flink.table.planner.runtime.stream.sql.TimeFunctionsITCase ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (yes / **no**) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (**yes** / no) - The serializers: (yes / **no** / don't know) - The runtime per-record code paths (performance sensitive): (yes / **no** / don't know) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / **no** / don't know) - The S3 file system connector: (yes / **no** / don't know) ## Documentation - Does this pull request introduce a new feature? (**yes** / no) - If yes, how is the feature documented? (not applicable / **docs** / **JavaDocs** / not documented) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-32682) Introduce option for choosing time function evaluation methods
[ https://issues.apache.org/jira/browse/FLINK-32682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-32682: --- Labels: pull-request-available (was: ) > Introduce option for choosing time function evaluation methods > -- > > Key: FLINK-32682 > URL: https://issues.apache.org/jira/browse/FLINK-32682 > Project: Flink > Issue Type: New Feature > Components: Table SQL / Planner, Table SQL / Runtime >Reporter: Dawid Wysakowicz >Priority: Major > Labels: pull-request-available > Fix For: 1.18.0 > > > In [FLIP-162|https://cwiki.apache.org/confluence/x/KAxRCg] as future plans it > was discussed to introduce an option {{table.exec.time-function-evaluation}} > to control evaluation method of time function. > We should add this option to be able to evaluate time functions with > {{query-time}} method in streaming mode. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-32687) Performance regression on handleGlobalFailureAndRestartAllTasks.BATCH_EVENLY since 2023-07-23
[ https://issues.apache.org/jira/browse/FLINK-32687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17747533#comment-17747533 ] Weihua Hu commented on FLINK-32687: --- [~martijnvisser] Thanks for reporting this. This regression is caused by https://github.com/apache/flink/pull/22913. This PR improves the Failover performance, The time cost for the STREAMING scene is reduced by 80%, and reducing the BATCH and STREAMING_EVENLY scenes by 20%. But there is a certain performance regression in the BatchEvenly scenario, I think this is acceptable. Because: 1) Batch Evenly is a strategy that is unlikely to be used in production, batch tasks run for a short time, and resources can be released when they are finished; 2) This only affects the failover process, and part of batch tasks (with block shuffling) will not trigger the global Failover. > Performance regression on handleGlobalFailureAndRestartAllTasks.BATCH_EVENLY > since 2023-07-23 > - > > Key: FLINK-32687 > URL: https://issues.apache.org/jira/browse/FLINK-32687 > Project: Flink > Issue Type: Bug >Affects Versions: 1.18.0 >Reporter: Martijn Visser >Priority: Blocker > > http://codespeed.dak8s.net:8000/timeline/#/?exe=5&ben=handleGlobalFailureAndRestartAllTasks.BATCH_EVENLY&extr=on&quarts=on&equid=off&env=2&revs=200 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] flinkbot commented on pull request #23083: [FLINK-32682] Make it possible to use query time based time functions in streaming mode
flinkbot commented on PR #23083: URL: https://github.com/apache/flink/pull/23083#issuecomment-1651874232 ## CI report: * 259010ae232c9677a1aff9d7c2103bb73c477db1 UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-32687) Performance regression on handleGlobalFailureAndRestartAllTasks.BATCH_EVENLY since 2023-07-23
[ https://issues.apache.org/jira/browse/FLINK-32687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17747539#comment-17747539 ] Martijn Visser commented on FLINK-32687: [~huwh] Thanks for the explainer, let's close it then :) > Performance regression on handleGlobalFailureAndRestartAllTasks.BATCH_EVENLY > since 2023-07-23 > - > > Key: FLINK-32687 > URL: https://issues.apache.org/jira/browse/FLINK-32687 > Project: Flink > Issue Type: Bug >Affects Versions: 1.18.0 >Reporter: Martijn Visser >Priority: Blocker > > http://codespeed.dak8s.net:8000/timeline/#/?exe=5&ben=handleGlobalFailureAndRestartAllTasks.BATCH_EVENLY&extr=on&quarts=on&equid=off&env=2&revs=200 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (FLINK-32682) Introduce option for choosing time function evaluation methods
[ https://issues.apache.org/jira/browse/FLINK-32682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Wysakowicz reassigned FLINK-32682: Assignee: Dawid Wysakowicz > Introduce option for choosing time function evaluation methods > -- > > Key: FLINK-32682 > URL: https://issues.apache.org/jira/browse/FLINK-32682 > Project: Flink > Issue Type: New Feature > Components: Table SQL / Planner, Table SQL / Runtime >Reporter: Dawid Wysakowicz >Assignee: Dawid Wysakowicz >Priority: Major > Labels: pull-request-available > Fix For: 1.18.0 > > > In [FLIP-162|https://cwiki.apache.org/confluence/x/KAxRCg] as future plans it > was discussed to introduce an option {{table.exec.time-function-evaluation}} > to control evaluation method of time function. > We should add this option to be able to evaluate time functions with > {{query-time}} method in streaming mode. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (FLINK-32687) Performance regression on handleGlobalFailureAndRestartAllTasks.BATCH_EVENLY since 2023-07-23
[ https://issues.apache.org/jira/browse/FLINK-32687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Martijn Visser closed FLINK-32687. -- Resolution: Not A Problem > Performance regression on handleGlobalFailureAndRestartAllTasks.BATCH_EVENLY > since 2023-07-23 > - > > Key: FLINK-32687 > URL: https://issues.apache.org/jira/browse/FLINK-32687 > Project: Flink > Issue Type: Bug >Affects Versions: 1.18.0 >Reporter: Martijn Visser >Priority: Blocker > > http://codespeed.dak8s.net:8000/timeline/#/?exe=5&ben=handleGlobalFailureAndRestartAllTasks.BATCH_EVENLY&extr=on&quarts=on&equid=off&env=2&revs=200 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-29281) Replace Akka by gRPC-based RPC implementation
[ https://issues.apache.org/jira/browse/FLINK-29281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17747553#comment-17747553 ] Claude Warren commented on FLINK-29281: --- Pekko v1.0.1 was just released. [d...@pekko.apache.org|mailto:d...@pekko.apache.org] is the dev list. Announcement email: https://lists.apache.org/thread/c10qoktyq0tv7t1bo14nxv2dt3s3sf2b > Replace Akka by gRPC-based RPC implementation > - > > Key: FLINK-29281 > URL: https://issues.apache.org/jira/browse/FLINK-29281 > Project: Flink > Issue Type: Technical Debt > Components: Runtime / RPC >Reporter: Chesnay Schepler >Assignee: Chesnay Schepler >Priority: Major > > Following the license change I propose to eventually replace Akka. > Based on LEGAL-619 an exemption is not feasible, and while a fork _may_ be > created it's long-term future is up in the air and I'd be uncomfortable with > relying on it. > I've been experimenting with a new RPC implementation based on gRPC and so > far I'm quite optimistic. It's also based on Netty while not requiring as > much of a tight coupling as Akka did. > This would also allow us to sidestep migrating our current Akka setup from > Netty 3 (which is affected by several CVEs) to Akka Artery, both saving work > and not introducing an entirely different network stack to the project. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-31400) Add autoscaler integration for Iceberg source
[ https://issues.apache.org/jira/browse/FLINK-31400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gyula Fora updated FLINK-31400: --- Fix Version/s: (was: kubernetes-operator-1.6.0) > Add autoscaler integration for Iceberg source > - > > Key: FLINK-31400 > URL: https://issues.apache.org/jira/browse/FLINK-31400 > Project: Flink > Issue Type: New Feature > Components: Autoscaler, Kubernetes Operator >Reporter: Maximilian Michels >Priority: Major > > A very critical part in the scaling algorithm is setting the source > processing rate correctly such that the Flink pipeline can keep up with the > ingestion rate. The autoscaler does that by looking at the {{pendingRecords}} > Flink source metric. Even if that metric is not available, the source can > still be sized according to the busyTimeMsPerSecond metric, but there will be > no backlog information available. For Kafka, the autoscaler also determines > the number of partitions to avoid scaling higher than the maximum number of > partitions. > In order to support a wider range of use cases, we should investigate an > integration with the Iceberg source. As far as I know, it does not expose the > pedingRecords metric, nor does the autoscaler know about other constraints, > e.g. the maximum number of open files. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-31998) Flink Operator Deadlock on run job Failure
[ https://issues.apache.org/jira/browse/FLINK-31998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gyula Fora updated FLINK-31998: --- Fix Version/s: (was: kubernetes-operator-1.6.0) > Flink Operator Deadlock on run job Failure > -- > > Key: FLINK-31998 > URL: https://issues.apache.org/jira/browse/FLINK-31998 > Project: Flink > Issue Type: Bug > Components: Kubernetes Operator >Affects Versions: kubernetes-operator-1.2.0, kubernetes-operator-1.3.0, > kubernetes-operator-1.4.0 >Reporter: Ahmed Hamdy >Priority: Major > Attachments: gleek-m6pLe3Wy--IpCKQavAQwBQ.png > > > h2. Description > FlinkOperator Reconciler goes into deadlock situation where it never udpates > Session job to DEPLOYED/ROLLED_BACK if {{deploy}} fails. > Attached sequence diagram of the issue where FlinkSessionJob is stuck in > UPGRADING indefinitely. > h2. proposed fix > Reconciler should roll back changes CR if > {{reconciliationStatus.isBeforeFirstDeployment()}} fails to {{{}deploy(){}}}. > [diagram|https://issues.apache.org/7239bb39-60d8-48a0-9052-f3231947edbe] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-32120) Add autoscaler config option to disable parallelism key group alignment
[ https://issues.apache.org/jira/browse/FLINK-32120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gyula Fora updated FLINK-32120: --- Fix Version/s: (was: kubernetes-operator-1.6.0) > Add autoscaler config option to disable parallelism key group alignment > --- > > Key: FLINK-32120 > URL: https://issues.apache.org/jira/browse/FLINK-32120 > Project: Flink > Issue Type: Improvement > Components: Autoscaler, Kubernetes Operator >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > Fix For: kubernetes-operator-1.5.1 > > > After choosing the target parallelism for a vertex, we choose a higher > parallelism if that parallelism leads to evenly spreading the number of key > groups. The number of key groups is derived from the max parallelism. > The amount of actual skew we would introduce if we did not do the alignment > would usually be pretty low. In fact, the data itself can have an uneven load > distribution across the keys (hot keys). In this case, the key group > alignment is not effective. > For experiments, we should allow disabling the key group alignment via a > configuration option. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-31867) Enforce a minimum number of observations within a metric window
[ https://issues.apache.org/jira/browse/FLINK-31867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gyula Fora updated FLINK-31867: --- Fix Version/s: (was: kubernetes-operator-1.6.0) > Enforce a minimum number of observations within a metric window > --- > > Key: FLINK-31867 > URL: https://issues.apache.org/jira/browse/FLINK-31867 > Project: Flink > Issue Type: Improvement > Components: Autoscaler, Kubernetes Operator >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > Labels: pull-request-available > > The metric window is currently only time-based. We should make sure we see a > minimum number of observations to ensure we don't decide based on too few > observations. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-32119) Revise source partition skew logic
[ https://issues.apache.org/jira/browse/FLINK-32119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gyula Fora updated FLINK-32119: --- Fix Version/s: (was: kubernetes-operator-1.6.0) > Revise source partition skew logic > --- > > Key: FLINK-32119 > URL: https://issues.apache.org/jira/browse/FLINK-32119 > Project: Flink > Issue Type: Bug > Components: Autoscaler, Kubernetes Operator >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > Fix For: kubernetes-operator-1.5.1 > > > After choosing the target parallelism for a vertex, we choose a higher > parallelism if that parallelism leads to evenly spreading the number of key > groups (=max parallelism). > Sources don't have keyed state, so this adjustment does not make sense for > key groups. However, we internally limit the max parallelism of sources to > the number of partitions discovered. This prevents partition skew. > The partition skew logic currently doesn’t work correctly when there are > multiple topics because we use the total number of partitions discovered. > Using a single max parallelism doesn’t yield skew free partition distribution > then. However, this is also true for a single topic when the number of > partitions is a prime number or a not easily divisible number. > Hence, we should add an option to guarantee skew free partition distribution > which means using the total number of partitions when another configuration > is not possible. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-32689) Insufficient validation for table.local-time-zone
Timo Walther created FLINK-32689: Summary: Insufficient validation for table.local-time-zone Key: FLINK-32689 URL: https://issues.apache.org/jira/browse/FLINK-32689 Project: Flink Issue Type: Bug Components: Table SQL / API Reporter: Timo Walther Assignee: Timo Walther There are still cases where timezone information is lost silently due to the interaction between {{java.util.TimeZone}} and {{java.time.ZoneId}}. This might be theoretical problem, but I would feel safer if we change the check to: {code} if (!java.util.TimeZone.getTimeZone(zoneId).toZoneId().equals(ZoneId.of(zoneId))) { throw new ValidationException(errorMessage); } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-32689) Insufficient validation for table.local-time-zone
[ https://issues.apache.org/jira/browse/FLINK-32689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17747557#comment-17747557 ] Timo Walther commented on FLINK-32689: -- [~leonard] Could you give more context why FLINK-22349 was necessary? The check above would fail for {{UTC-1}} and {{UT-1}} (which is also officially supported). But current master accepts {{UT-1}}, is this a bug? > Insufficient validation for table.local-time-zone > - > > Key: FLINK-32689 > URL: https://issues.apache.org/jira/browse/FLINK-32689 > Project: Flink > Issue Type: Bug > Components: Table SQL / API >Reporter: Timo Walther >Assignee: Timo Walther >Priority: Major > > There are still cases where timezone information is lost silently due to the > interaction between {{java.util.TimeZone}} and {{java.time.ZoneId}}. > This might be theoretical problem, but I would feel safer if we change the > check to: > {code} > if > (!java.util.TimeZone.getTimeZone(zoneId).toZoneId().equals(ZoneId.of(zoneId))) > { >throw new ValidationException(errorMessage); > } > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-32689) Insufficient validation for table.local-time-zone
[ https://issues.apache.org/jira/browse/FLINK-32689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17747559#comment-17747559 ] Timo Walther commented on FLINK-32689: -- Update: I could confirm it with this program {code} final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); final StreamTableEnvironment tableEnv = StreamTableEnvironment.create(env); DataStream row = env.fromElements(Row.of(Instant.now().toEpochMilli())); tableEnv.createTemporaryView("t", row); tableEnv.getConfig().set(TableConfigOptions.LOCAL_TIME_ZONE, "UT-10"); tableEnv.executeSql("SELECT TO_TIMESTAMP_LTZ(f0, 3) FROM t").print(); {code} Will prepare a fix. > Insufficient validation for table.local-time-zone > - > > Key: FLINK-32689 > URL: https://issues.apache.org/jira/browse/FLINK-32689 > Project: Flink > Issue Type: Bug > Components: Table SQL / API >Reporter: Timo Walther >Assignee: Timo Walther >Priority: Major > > There are still cases where timezone information is lost silently due to the > interaction between {{java.util.TimeZone}} and {{java.time.ZoneId}}. > This might be theoretical problem, but I would feel safer if we change the > check to: > {code} > if > (!java.util.TimeZone.getTimeZone(zoneId).toZoneId().equals(ZoneId.of(zoneId))) > { >throw new ValidationException(errorMessage); > } > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-32684) Renaming AkkaOptions into PekkoOptions
[ https://issues.apache.org/jira/browse/FLINK-32684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17747575#comment-17747575 ] Chesnay Schepler commented on FLINK-32684: -- Related to FLINK-14052. > Renaming AkkaOptions into PekkoOptions > -- > > Key: FLINK-32684 > URL: https://issues.apache.org/jira/browse/FLINK-32684 > Project: Flink > Issue Type: Technical Debt > Components: Runtime / Coordination >Affects Versions: 1.18.0 >Reporter: Matthias Pohl >Priority: Major > Fix For: 2.0.0 > > > FLINK-32468 introduced Apache Pekko as an replacement for Akka. This involved > renaming classes (besides updating comments). {{AkkaOptions}} was the only > occurrence that wasn't renamed as it's annotated as {{@PublicEvolving}}. > This issue is about renaming {{AkkaOptions}} into {{PekkoOptions}}. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] hanyuzheng7 commented on pull request #22842: [FLINK-32261][table] Add built-in MAP_UNION function.
hanyuzheng7 commented on PR #22842: URL: https://github.com/apache/flink/pull/22842#issuecomment-1652083503 @flinkbot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-32684) Renaming AkkaOptions into PekkoOptions
[ https://issues.apache.org/jira/browse/FLINK-32684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl updated FLINK-32684: -- Description: FLINK-32468 introduced Apache Pekko as an replacement for Akka. This involved renaming classes (besides updating comments). {{AkkaOptions}} was the only occurrence that wasn't renamed as it's annotated as {{@PublicEvolving}}. This issue is about renaming {{AkkaOptions}} into {{PekkoOptions}}. Update: It's also an option to rename it to a more general name like RpcOptions (see FLINK-14052) because there's a plan was: FLINK-32468 introduced Apache Pekko as an replacement for Akka. This involved renaming classes (besides updating comments). {{AkkaOptions}} was the only occurrence that wasn't renamed as it's annotated as {{@PublicEvolving}}. This issue is about renaming {{AkkaOptions}} into {{PekkoOptions}}. > Renaming AkkaOptions into PekkoOptions > -- > > Key: FLINK-32684 > URL: https://issues.apache.org/jira/browse/FLINK-32684 > Project: Flink > Issue Type: Technical Debt > Components: Runtime / Coordination >Affects Versions: 1.18.0 >Reporter: Matthias Pohl >Priority: Major > Fix For: 2.0.0 > > > FLINK-32468 introduced Apache Pekko as an replacement for Akka. This involved > renaming classes (besides updating comments). {{AkkaOptions}} was the only > occurrence that wasn't renamed as it's annotated as {{@PublicEvolving}}. > This issue is about renaming {{AkkaOptions}} into {{PekkoOptions}}. > Update: It's also an option to rename it to a more general name like > RpcOptions (see FLINK-14052) because there's a plan -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-32684) Renaming AkkaOptions into PekkoOptions
[ https://issues.apache.org/jira/browse/FLINK-32684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl updated FLINK-32684: -- Description: FLINK-32468 introduced Apache Pekko as an replacement for Akka. This involved renaming classes (besides updating comments). {{AkkaOptions}} was the only occurrence that wasn't renamed as it's annotated as {{@PublicEvolving}}. This issue is about renaming {{AkkaOptions}} into {{PekkoOptions}} (or a more general term considering was: FLINK-32468 introduced Apache Pekko as an replacement for Akka. This involved renaming classes (besides updating comments). {{AkkaOptions}} was the only occurrence that wasn't renamed as it's annotated as {{@PublicEvolving}}. This issue is about renaming {{AkkaOptions}} into {{PekkoOptions}}. > Renaming AkkaOptions into PekkoOptions > -- > > Key: FLINK-32684 > URL: https://issues.apache.org/jira/browse/FLINK-32684 > Project: Flink > Issue Type: Technical Debt > Components: Runtime / Coordination >Affects Versions: 1.18.0 >Reporter: Matthias Pohl >Priority: Major > Fix For: 2.0.0 > > > FLINK-32468 introduced Apache Pekko as an replacement for Akka. This involved > renaming classes (besides updating comments). {{AkkaOptions}} was the only > occurrence that wasn't renamed as it's annotated as {{@PublicEvolving}}. > This issue is about renaming {{AkkaOptions}} into {{PekkoOptions}} (or a more > general term considering -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-32684) Renaming AkkaOptions into PekkoOptions
[ https://issues.apache.org/jira/browse/FLINK-32684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl updated FLINK-32684: -- Description: FLINK-32468 introduced Apache Pekko as an replacement for Akka. This involved renaming classes (besides updating comments). {{AkkaOptions}} was the only occurrence that wasn't renamed as it's annotated as {{@PublicEvolving}}. This issue is about renaming {{AkkaOptions}} into {{PekkoOptions}}. was: FLINK-32468 introduced Apache Pekko as an replacement for Akka. This involved renaming classes (besides updating comments). {{AkkaOptions}} was the only occurrence that wasn't renamed as it's annotated as {{@PublicEvolving}}. This issue is about renaming {{AkkaOptions}} into {{PekkoOptions}}. Update: It's also an option to rename it to a more general name like RpcOptions (see FLINK-14052) because there's a plan > Renaming AkkaOptions into PekkoOptions > -- > > Key: FLINK-32684 > URL: https://issues.apache.org/jira/browse/FLINK-32684 > Project: Flink > Issue Type: Technical Debt > Components: Runtime / Coordination >Affects Versions: 1.18.0 >Reporter: Matthias Pohl >Priority: Major > Fix For: 2.0.0 > > > FLINK-32468 introduced Apache Pekko as an replacement for Akka. This involved > renaming classes (besides updating comments). {{AkkaOptions}} was the only > occurrence that wasn't renamed as it's annotated as {{@PublicEvolving}}. > This issue is about renaming {{AkkaOptions}} into {{PekkoOptions}}. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-32684) Renaming AkkaOptions into PekkoOptions
[ https://issues.apache.org/jira/browse/FLINK-32684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl updated FLINK-32684: -- Description: FLINK-32468 introduced Apache Pekko as an replacement for Akka. This involved renaming classes (besides updating comments). {{AkkaOptions}} was the only occurrence that wasn't renamed as it's annotated as {{@PublicEvolving}}. This issue is about renaming {{AkkaOptions}} into {{PekkoOptions}} (or a more general term considering FLINK-29281) was: FLINK-32468 introduced Apache Pekko as an replacement for Akka. This involved renaming classes (besides updating comments). {{AkkaOptions}} was the only occurrence that wasn't renamed as it's annotated as {{@PublicEvolving}}. This issue is about renaming {{AkkaOptions}} into {{PekkoOptions}} (or a more general term considering > Renaming AkkaOptions into PekkoOptions > -- > > Key: FLINK-32684 > URL: https://issues.apache.org/jira/browse/FLINK-32684 > Project: Flink > Issue Type: Technical Debt > Components: Runtime / Coordination >Affects Versions: 1.18.0 >Reporter: Matthias Pohl >Priority: Major > Fix For: 2.0.0 > > > FLINK-32468 introduced Apache Pekko as an replacement for Akka. This involved > renaming classes (besides updating comments). {{AkkaOptions}} was the only > occurrence that wasn't renamed as it's annotated as {{@PublicEvolving}}. > This issue is about renaming {{AkkaOptions}} into {{PekkoOptions}} (or a more > general term considering FLINK-29281) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (FLINK-31611) Add delayed restart to failed jobs
[ https://issues.apache.org/jira/browse/FLINK-31611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matyas Orhidi reassigned FLINK-31611: - Assignee: (was: Matyas Orhidi) > Add delayed restart to failed jobs > -- > > Key: FLINK-31611 > URL: https://issues.apache.org/jira/browse/FLINK-31611 > Project: Flink > Issue Type: New Feature > Components: Kubernetes Operator >Reporter: Matyas Orhidi >Priority: Major > > Operator is able to restart failed jobs already using: > {{kubernetes.operator.job.restart.failed: true}} > It's beneficial however to keep a failed job around for a while for > inspection: > {{kubernetes.operator.job.restart.failed.delay: 5m}} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-32690) Report Double.NAN instead of null for missing autoscaler metrics
Matyas Orhidi created FLINK-32690: - Summary: Report Double.NAN instead of null for missing autoscaler metrics Key: FLINK-32690 URL: https://issues.apache.org/jira/browse/FLINK-32690 Project: Flink Issue Type: Improvement Components: Kubernetes Operator Reporter: Matyas Orhidi Fix For: kubernetes-operator-1.7.0 Change null values to Double.NAN for autoscaler metrics during blackout periods when no data is gathered. This appears to be a more common practice then null. Also consistent with other metrics we have. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] patricklucas commented on a diff in pull request #22987: [FLINK-32583][rest] Fix deadlock in RestClient
patricklucas commented on code in PR #22987: URL: https://github.com/apache/flink/pull/22987#discussion_r1275249702 ## flink-runtime/src/test/java/org/apache/flink/runtime/rest/RestClientTest.java: ## @@ -207,6 +218,120 @@ public void testRestClientClosedHandling() throws Exception { } } +/** + * Tests that the futures returned by {@link RestClient} fail immediately if the client is + * already closed. + * + * See FLINK-32583 + */ +@Test +public void testCloseClientBeforeRequest() throws Exception { +try (final RestClient restClient = +new RestClient(new Configuration(), Executors.directExecutor())) { +restClient.close(); // Intentionally close the client prior to the request + +CompletableFuture future = +restClient.sendRequest( +unroutableIp, +80, +new TestMessageHeaders(), +EmptyMessageParameters.getInstance(), +EmptyRequestBody.getInstance()); + +// Call get() on the future with a timeout of 0s so we can test that the exception +// thrown is not a TimeoutException, which is what would be thrown if restClient were +// not already closed +final ThrowingRunnable getFuture = () -> future.get(0, TimeUnit.SECONDS); + +final Throwable cause = assertThrows(ExecutionException.class, getFuture).getCause(); +assertThat(cause, instanceOf(IllegalStateException.class)); +assertThat(cause.getMessage(), equalTo("RestClient is already closed")); +} +} + +@Test +public void testCloseClientWhileProcessingRequest() throws Exception { +// Set up a Netty SelectStrategy with latches that allow us to step forward through Netty's +// request state machine, closing the client at a particular moment +final OneShotLatch connectTriggered = new OneShotLatch(); +final OneShotLatch closeTriggered = new OneShotLatch(); +final SelectStrategy fallbackSelectStrategy = +DefaultSelectStrategyFactory.INSTANCE.newSelectStrategy(); +final SelectStrategyFactory selectStrategyFactory = +() -> +(selectSupplier, hasTasks) -> { +connectTriggered.trigger(); +closeTriggered.awaitQuietly(1, TimeUnit.SECONDS); Review Comment: Alright, no problem following that guidance—I actually hit it locally and it took a few minutes to track down this instance which should always resolve in milliseconds. I considered the risks but am happy to follow the convention. ## flink-runtime/src/test/java/org/apache/flink/runtime/rest/RestClientTest.java: ## @@ -207,6 +218,120 @@ public void testRestClientClosedHandling() throws Exception { } } +/** + * Tests that the futures returned by {@link RestClient} fail immediately if the client is + * already closed. + * + * See FLINK-32583 + */ +@Test +public void testCloseClientBeforeRequest() throws Exception { +try (final RestClient restClient = +new RestClient(new Configuration(), Executors.directExecutor())) { +restClient.close(); // Intentionally close the client prior to the request + +CompletableFuture future = +restClient.sendRequest( +unroutableIp, +80, +new TestMessageHeaders(), +EmptyMessageParameters.getInstance(), +EmptyRequestBody.getInstance()); + +// Call get() on the future with a timeout of 0s so we can test that the exception +// thrown is not a TimeoutException, which is what would be thrown if restClient were +// not already closed +final ThrowingRunnable getFuture = () -> future.get(0, TimeUnit.SECONDS); + +final Throwable cause = assertThrows(ExecutionException.class, getFuture).getCause(); +assertThat(cause, instanceOf(IllegalStateException.class)); +assertThat(cause.getMessage(), equalTo("RestClient is already closed")); +} +} + +@Test +public void testCloseClientWhileProcessingRequest() throws Exception { +// Set up a Netty SelectStrategy with latches that allow us to step forward through Netty's +// request state machine, closing the client at a particular moment +final OneShotLatch connectTriggered = new OneShotLatch(); +final OneShotLatch closeTriggered = new OneShotLatch(); +final SelectStrategy fallbackSelectStrategy = +DefaultSelectStrategyFactory.INSTANCE.newSelectStrategy(); +final SelectStrategyFactory sel
[GitHub] [flink] patricklucas commented on pull request #22987: [FLINK-32583][rest] Fix deadlock in RestClient
patricklucas commented on PR #22987: URL: https://github.com/apache/flink/pull/22987#issuecomment-1652189416 Great, I'll get to this tomorrow, thanks for the reviews @XComp. Regarding the commit organization, I'm happy to break out another hotfix commit—wasn't sure if the branch would get squashed before going into master. Can you confirm that fast-forwarding is the merge strategy? If that's the case, I assume you want me to also fixup the "Changes per review" commits into the primary commit? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink-kubernetes-operator] morhidi opened a new pull request, #639: [FLINK-32690] report Double.NAN instead of null for missing autoscaler metrics
morhidi opened a new pull request, #639: URL: https://github.com/apache/flink-kubernetes-operator/pull/639 ## What is the purpose of the change Change null values to Double.NAN for autoscaler metrics during blackout periods when no data is gathered. This appears to be a more common practice then null. Also consistent with other metrics we have. ## Brief change log This is a minor fix in `AutoscalerFlinkMetrics` ## Verifying this change Manually + Unit tests ``` resource.default.autoscaling-example.FlinkDeployment.AutoScaler.jobVertexID.cbc357ccb763df2852fee8c4fc7d55f2.RECOMMENDED_PARALLELISM.Current: NaN resource.default.autoscaling-example.FlinkDeployment.AutoScaler.jobVertexID.90bea66de1c231edf33913ecd54406c1.RECOMMENDED_PARALLELISM.Current: NaN resource.default.autoscaling-example.FlinkDeployment.AutoScaler.jobVertexID.cbc357ccb763df2852fee8c4fc7d55f2.MAX_PARALLELISM.Current: NaN resource.default.autoscaling-example.FlinkDeployment.AutoScaler.jobVertexID.90bea66de1c231edf33913ecd54406c1.CATCH_UP_DATA_RATE.Current: NaN resource.default.autoscaling-example.FlinkDeployment.AutoScaler.jobVertexID.cbc357ccb763df2852fee8c4fc7d55f2.PARALLELISM.Current: NaN resource.default.autoscaling-example.FlinkDeployment.AutoScaler.jobVertexID.90bea66de1c231edf33913ecd54406c1.TRUE_PROCESSING_RATE.Average: NaN resource.default.autoscaling-example.FlinkDeployment.AutoScaler.jobVertexID.cbc357ccb763df2852fee8c4fc7d55f2.CATCH_UP_DATA_RATE.Current: NaN resource.default.autoscaling-example.FlinkDeployment.AutoScaler.jobVertexID.90bea66de1c231edf33913ecd54406c1.MAX_PARALLELISM.Current: NaN system.FlinkDeployment.Lifecycle.State.CREATED.TimeSeconds: count=0, min=0, max=0, mean=NaN, stddev=NaN, p50=NaN, p75=NaN, p95=NaN, p98=NaN, p99=NaN, p999=NaN system.FlinkDeployment.Lifecycle.State.ROLLING_BACK.TimeSeconds: count=0, min=0, max=0, mean=NaN, stddev=NaN, p50=NaN, p75=NaN, p95=NaN, p98=NaN, p99=NaN, p999=NaN namespace.default.FlinkDeployment.Lifecycle.State.STABLE.TimeSeconds: count=0, min=0, max=0, mean=NaN, stddev=NaN, p50=NaN, p75=NaN, p95=NaN, p98=NaN, p99=NaN, p999=NaN namespace.default.FlinkDeployment.Lifecycle.State.ROLLING_BACK.TimeSeconds: count=0, min=0, max=0, mean=NaN, stddev=NaN, p50=NaN, p75=NaN, p95=NaN, p98=NaN, p99=NaN, p999=NaN ``` ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): no - The public API, i.e., is any changes to the `CustomResourceDescriptors`: no - Core observer or reconciler logic that is regularly executed: no ## Documentation - Does this pull request introduce a new feature? no - If yes, how is the feature documented? not applicable -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-32690) Report Double.NAN instead of null for missing autoscaler metrics
[ https://issues.apache.org/jira/browse/FLINK-32690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-32690: --- Labels: pull-request-available (was: ) > Report Double.NAN instead of null for missing autoscaler metrics > > > Key: FLINK-32690 > URL: https://issues.apache.org/jira/browse/FLINK-32690 > Project: Flink > Issue Type: Improvement > Components: Kubernetes Operator >Reporter: Matyas Orhidi >Priority: Major > Labels: pull-request-available > Fix For: kubernetes-operator-1.7.0 > > > Change null values to Double.NAN for autoscaler metrics during blackout > periods when no data is gathered. This appears to be a more common practice > then null. Also consistent with other metrics we have. -- This message was sent by Atlassian Jira (v8.20.10#820010)