[GitHub] [flink] flinkbot commented on pull request #14693: [FLINK-19446][canal][json] canal-json has a situation that -U and +U are equal, when updating the null field to be non-null
flinkbot commented on pull request #14693: URL: https://github.com/apache/flink/pull/14693#issuecomment-762669697 ## CI report: * 58b45f883dfd13a36efbf22b17a487fd28d94d90 UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] rmetzger commented on a change in pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager
rmetzger commented on a change in pull request #14678: URL: https://github.com/apache/flink/pull/14678#discussion_r559978485 ## File path: flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/FailureListener.java ## @@ -0,0 +1,40 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.runtime.executiongraph; + +import org.apache.flink.core.plugin.Plugin; +import org.apache.flink.runtime.metrics.groups.JobManagerJobMetricGroup; + +/** Failure listener to customize the behavior for each type of failures tracked in job manager. */ +public interface FailureListener extends Plugin { + +/** + * Initialize the listener with JobManagerJobMetricGroup. + * + * @param metricGroup metrics group that the listener can add customized metrics definition. + */ +void init(JobManagerJobMetricGroup metricGroup); + +/** + * Method to handle a failure in the listener. + * + * @param cause the failure cause + * @param globalFailure whether the failure is a global failure + */ +void onFailure(final Throwable cause, boolean globalFailure); Review comment: The `JobManagerJobMetricGroup` contains the JobId. Would it make sense to introduce a FailureListenerMetricGroup extends MetricGroup that exposes the JobId and JobName, and is not considered an internal API? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] wangyang0918 commented on pull request #14591: FLINK-20359 Added Owner Reference to Job Manager in native kubernetes
wangyang0918 commented on pull request #14591: URL: https://github.com/apache/flink/pull/14591#issuecomment-762671877 I have run a Flink application on minikube with setting the owner reference of JobManager deployment to a new created service(e.g. my-test-svc). After the service is deleted, all the application K8s related resources, including JobManager deployment, could be deleted by GC. This PR works well. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] rmetzger commented on a change in pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager
rmetzger commented on a change in pull request #14678: URL: https://github.com/apache/flink/pull/14678#discussion_r559979096 ## File path: flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/failover/flip1/ExecutionFailureHandler.java ## @@ -98,11 +103,26 @@ public FailureHandlingResult getGlobalFailureHandlingResult(final Throwable caus true); } +/** @param failureListener the failure listener to be registered */ +public void registerFailureListener(FailureListener failureListener) { +failureListeners.add(failureListener); +} + private FailureHandlingResult handleFailure( final Throwable cause, final Set verticesToRestart, final boolean globalFailure) { +try { +for (FailureListener listener : failureListeners) { +listener.onFailure(cause, globalFailure); +} +} catch (Throwable e) { +return FailureHandlingResult.unrecoverable( +new JobException("The failure in failure listener is not recoverable", e), Review comment: ```suggestion new JobException("Unexpected exception in FailureListener", e), ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #10847: [FLINK-15578][connectors/jdbc] implement exactly once JDBC sink
flinkbot edited a comment on pull request #10847: URL: https://github.com/apache/flink/pull/10847#issuecomment-573933799 ## CI report: * 88b27b180a90899b60afbfd12c2930939f76a3a4 UNKNOWN * 51794ed8d24485bee01645ee15e7ecc1b8d13f8f Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12203) * 6f27572110fbc02a1f0ad65260212b621b928dfe UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-21012) AvroFileFormatFactory uses non-deserializable lambda function
[ https://issues.apache.org/jira/browse/FLINK-21012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger updated FLINK-21012: --- Fix Version/s: (was: 1.12.3) 1.12.2 > AvroFileFormatFactory uses non-deserializable lambda function > - > > Key: FLINK-21012 > URL: https://issues.apache.org/jira/browse/FLINK-21012 > Project: Flink > Issue Type: Bug > Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile) >Affects Versions: 1.12.0, 1.12.1 >Reporter: Ingo Bürk >Priority: Critical > Fix For: 1.13.0, 1.12.2 > > > In AvroFileFormatFactory#RowDataAvroWriterFactory a lambda function is used > to create the factory. This can causes > {code:java} > Caused by: java.lang.IllegalArgumentException: Invalid lambda > deserialization{code} > There's other similar issues like FLINK-20147, FLINK-18857 and FLINK-18006 > and the solution so far seems to have been to replace the lambda with an > anonymous class instead. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-21022) flink-connector-es add onSuccess handler after bulk process for sync success data to other third party system for data consistency checking
[ https://issues.apache.org/jira/browse/FLINK-21022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger updated FLINK-21022: --- Fix Version/s: (was: 1.12.3) 1.12.2 > flink-connector-es add onSuccess handler after bulk process for sync success > data to other third party system for data consistency checking > --- > > Key: FLINK-21022 > URL: https://issues.apache.org/jira/browse/FLINK-21022 > Project: Flink > Issue Type: Improvement > Components: Connectors / ElasticSearch >Affects Versions: 1.12.0 >Reporter: Zheng WEI >Priority: Major > Fix For: 1.13.0, 1.12.2 > > > flink-connector-es add onSuccess handler after successful bulk process, in > order to sync success data to other third party system for data consistency > checking. Default the implementation of onSuccess function is empty logic, > user can set its own onSuccess handler when needed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] flinkbot edited a comment on pull request #14692: [FLINK-20944][k8s] Do not resolve the rest endpoint address when the service exposed type is ClusterIP
flinkbot edited a comment on pull request #14692: URL: https://github.com/apache/flink/pull/14692#issuecomment-762669595 ## CI report: * 48ef644b381d5f9f0dd030857fcaab5e80ab4a97 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12220) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #14693: [FLINK-19446][canal][json] canal-json has a situation that -U and +U are equal, when updating the null field to be non-null
flinkbot edited a comment on pull request #14693: URL: https://github.com/apache/flink/pull/14693#issuecomment-762669697 ## CI report: * 58b45f883dfd13a36efbf22b17a487fd28d94d90 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12221) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-21023) Task Manager uses the container dir of Job Manager when running flink job on yarn-cluster.
[ https://issues.apache.org/jira/browse/FLINK-21023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tang Yan updated FLINK-21023: - Description: I want to try to use option -yt(yarnship) to distribute my config files to Yarn cluster, and read the file in code. I just used the flink example wordcount. Here is my submit command: /opt/Flink/bin/flink run -m yarn-cluster -p 1 -yt /path/to/conf -c org.apache.flink.examples.java.wordcount.WordCount /opt/Flink/examples/batch/WordCount.jar --input conf/cmp_online.cfg Test Result: I found that if the job manager and task manager are lunched on the same node, the job can run successfully. But when they're running on different node, the job will fail in the below ERRORs. I find the conf folder has been distributed to container cache dirs, such as [file:/data/d1/yarn/nm/usercache/yanta/appcache/application_1609125504851_3620/container_e283_1609125504851_3620_01_01/conf|file:///data/d1/yarn/nm/usercache/yanta/appcache/application_1609125504851_3620/container_e283_1609125504851_3620_01_01/conf] on job manager node, and [file:/data/d1/yarn/nm/usercache/yanta/appcache/application_1609125504851_3620/container_e283_1609125504851_3620_01_02/conf|file:///data/d1/yarn/nm/usercache/yanta/appcache/application_1609125504851_3620/container_e283_1609125504851_3620_01_02/conf] on task manager node. But why the task manager loads the conf file from the container_eXXX_01 path (which is located on job manager node)? _2021-01-19 04:19:11,405 INFO org.apache.flink.yarn.YarnResourceManager [] - Registering TaskManager with ResourceID container_e283_1609125504851_3620_01_02 (akka.tcp://fl...@rphf1hsn026.qa.webex.com:46785/user/rpc/taskmanager_0) at ResourceManager 2021-01-19 04:19:11,506 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - CHAIN DataSource (at main(WordCount.java:69) (org.apache.flink.api.java.io.TextInputFormat)) -> FlatMap (FlatMap at main(WordCount.java:84)) -> Combine (SUM(1), at main(WordCount.java:87) (1/1) (10cf614584617de770c6fd0f0aad4db7) switched from SCHEDULED to DEPLOYING. 2021-01-19 04:19:11,507 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Deploying CHAIN DataSource (at main(WordCount.java:69) (org.apache.flink.api.java.io.TextInputFormat)) -> FlatMap (FlatMap at main(WordCount.java:84)) -> Combine (SUM(1), at main(WordCount.java:87) (1/1) (attempt #0) to container_e283_1609125504851_3620_01_02 @ rphf1hsn026.qa.webex.com (dataPort=46647) 2021-01-19 04:19:11,608 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - CHAIN DataSource (at main(WordCount.java:69) (org.apache.flink.api.java.io.TextInputFormat)) -> FlatMap (FlatMap at main(WordCount.java:84)) -> Combine (SUM(1), at main(WordCount.java:87) (1/1) (10cf614584617de770c6fd0f0aad4db7) switched from DEPLOYING to RUNNING. 2021-01-19 04:19:11,792 INFO org.apache.flink.api.common.io.LocatableInputSplitAssigner [] - Assigning remote split to host rphf1hsn026 2021-01-19 04:19:11,847 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - CHAIN DataSource (at main(WordCount.java:69) (org.apache.flink.api.java.io.TextInputFormat)) -> FlatMap (FlatMap at main(WordCount.java:84)) -> Combine (SUM(1), at main(WordCount.java:87) (1/1) (10cf614584617de770c6fd0f0aad4db7) switched from RUNNING to FAILED on org.apache.flink.runtime.jobmaster.slotpool.SingleLogicalSlot@3e19cc76. java.io.IOException: Error opening the Input Split [file:/data/d1/yarn/nm/usercache/yanta/appcache/application_1609125504851_3620/container_e283_1609125504851_3620_01_01/conf/cmp_online.cfg|file:///data/d1/yarn/nm/usercache/yanta/appcache/application_1609125504851_3620/container_e283_1609125504851_3620_01_01/conf/cmp_online.cfg] [0,71]: /data/d1/yarn/nm/usercache/yanta/appcache/application_1609125504851_3620/container_e283_1609125504851_3620_01_01/conf/cmp_online.cfg (No such file or directory) at org.apache.flink.api.common.io.FileInputFormat.open(FileInputFormat.java:824) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.api.common.io.DelimitedInputFormat.open(DelimitedInputFormat.java:470) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.api.common.io.DelimitedInputFormat.open(DelimitedInputFormat.java:47) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:173) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:721) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.taskmanager.Task.run(Task.java:546) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_272] Caused by: java.io.FileNotFoundException: /data/d1/yarn/nm/usercache/yanta/appcache/application_1609125504851_3620/container_e283_1609125504851_3620_01_01/conf/cmp_online.cfg (No such file or direc
[jira] [Updated] (FLINK-21023) Task Manager uses the container dir of Job Manager when running flink job on yarn-cluster.
[ https://issues.apache.org/jira/browse/FLINK-21023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tang Yan updated FLINK-21023: - Description: I want to try to use option -yt(yarnship) to distribute my config files to Yarn cluster, and read the file in code. I just used the flink example wordcount. Here is my submit command: /opt/Flink/bin/flink run -m yarn-cluster -p 1 -yt /path/to/conf -c org.apache.flink.examples.java.wordcount.WordCount /opt/Flink/examples/batch/WordCount.jar --input ./conf/cmp_online.cfg Test Result: I found that if the job manager and task manager are lunched on the same node, the job can run successfully. But when they're running on different node, the job will fail in the below ERRORs. I find the conf folder has been distributed to container cache dirs, such as [file:/data/d1/yarn/nm/usercache/yanta/appcache/application_1609125504851_3620/container_e283_1609125504851_3620_01_01/conf|file:///data/d1/yarn/nm/usercache/yanta/appcache/application_1609125504851_3620/container_e283_1609125504851_3620_01_01/conf] on job manager node, and [file:/data/d1/yarn/nm/usercache/yanta/appcache/application_1609125504851_3620/container_e283_1609125504851_3620_01_02/conf|file:///data/d1/yarn/nm/usercache/yanta/appcache/application_1609125504851_3620/container_e283_1609125504851_3620_01_02/conf] on task manager node. But why the task manager loads the conf file from the container_eXXX_01 path (which is located on job manager node)? _2021-01-19 04:19:11,405 INFO org.apache.flink.yarn.YarnResourceManager [] - Registering TaskManager with ResourceID container_e283_1609125504851_3620_01_02 (akka.tcp://fl...@rphf1hsn026.qa.webex.com:46785/user/rpc/taskmanager_0) at ResourceManager 2021-01-19 04:19:11,506 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - CHAIN DataSource (at main(WordCount.java:69) (org.apache.flink.api.java.io.TextInputFormat)) -> FlatMap (FlatMap at main(WordCount.java:84)) -> Combine (SUM(1), at main(WordCount.java:87) (1/1) (10cf614584617de770c6fd0f0aad4db7) switched from SCHEDULED to DEPLOYING. 2021-01-19 04:19:11,507 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Deploying CHAIN DataSource (at main(WordCount.java:69) (org.apache.flink.api.java.io.TextInputFormat)) -> FlatMap (FlatMap at main(WordCount.java:84)) -> Combine (SUM(1), at main(WordCount.java:87) (1/1) (attempt #0) to container_e283_1609125504851_3620_01_02 @ rphf1hsn026.qa.webex.com (dataPort=46647) 2021-01-19 04:19:11,608 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - CHAIN DataSource (at main(WordCount.java:69) (org.apache.flink.api.java.io.TextInputFormat)) -> FlatMap (FlatMap at main(WordCount.java:84)) -> Combine (SUM(1), at main(WordCount.java:87) (1/1) (10cf614584617de770c6fd0f0aad4db7) switched from DEPLOYING to RUNNING. 2021-01-19 04:19:11,792 INFO org.apache.flink.api.common.io.LocatableInputSplitAssigner [] - Assigning remote split to host rphf1hsn026 2021-01-19 04:19:11,847 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - CHAIN DataSource (at main(WordCount.java:69) (org.apache.flink.api.java.io.TextInputFormat)) -> FlatMap (FlatMap at main(WordCount.java:84)) -> Combine (SUM(1), at main(WordCount.java:87) (1/1) (10cf614584617de770c6fd0f0aad4db7) switched from RUNNING to FAILED on org.apache.flink.runtime.jobmaster.slotpool.SingleLogicalSlot@3e19cc76. java.io.IOException: Error opening the Input Split [file:/data/d1/yarn/nm/usercache/yanta/appcache/application_1609125504851_3620/container_e283_1609125504851_3620_01_01/conf/cmp_online.cfg|file:///data/d1/yarn/nm/usercache/yanta/appcache/application_1609125504851_3620/container_e283_1609125504851_3620_01_01/conf/cmp_online.cfg] [0,71]: /data/d1/yarn/nm/usercache/yanta/appcache/application_1609125504851_3620/container_e283_1609125504851_3620_01_01/conf/cmp_online.cfg (No such file or directory) at org.apache.flink.api.common.io.FileInputFormat.open(FileInputFormat.java:824) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.api.common.io.DelimitedInputFormat.open(DelimitedInputFormat.java:470) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.api.common.io.DelimitedInputFormat.open(DelimitedInputFormat.java:47) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:173) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:721) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.taskmanager.Task.run(Task.java:546) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_272] Caused by: java.io.FileNotFoundException: /data/d1/yarn/nm/usercache/yanta/appcache/application_1609125504851_3620/container_e283_1609125504851_3620_01_01/conf/cmp_online.cfg (No such file or dir
[jira] [Updated] (FLINK-21023) Task Manager uses the container dir of Job Manager when running flink job on yarn-cluster.
[ https://issues.apache.org/jira/browse/FLINK-21023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tang Yan updated FLINK-21023: - Component/s: Deployment / YARN > Task Manager uses the container dir of Job Manager when running flink job on > yarn-cluster. > -- > > Key: FLINK-21023 > URL: https://issues.apache.org/jira/browse/FLINK-21023 > Project: Flink > Issue Type: Bug > Components: Client / Job Submission, Deployment / YARN >Affects Versions: 1.12.0, 1.11.1 >Reporter: Tang Yan >Priority: Critical > > I want to try to use option -yt(yarnship) to distribute my config files to > Yarn cluster, and read the file in code. I just used the flink example > wordcount. > Here is my submit command: > /opt/Flink/bin/flink run -m yarn-cluster -p 1 -yt /path/to/conf -c > org.apache.flink.examples.java.wordcount.WordCount > /opt/Flink/examples/batch/WordCount.jar --input ./conf/cmp_online.cfg > Test Result: > I found that if the job manager and task manager are lunched on the same > node, the job can run successfully. But when they're running on different > node, the job will fail in the below ERRORs. I find the conf folder has been > distributed to container cache dirs, such as > [file:/data/d1/yarn/nm/usercache/yanta/appcache/application_1609125504851_3620/container_e283_1609125504851_3620_01_01/conf|file:///data/d1/yarn/nm/usercache/yanta/appcache/application_1609125504851_3620/container_e283_1609125504851_3620_01_01/conf] > on job manager node, and > [file:/data/d1/yarn/nm/usercache/yanta/appcache/application_1609125504851_3620/container_e283_1609125504851_3620_01_02/conf|file:///data/d1/yarn/nm/usercache/yanta/appcache/application_1609125504851_3620/container_e283_1609125504851_3620_01_02/conf] > on task manager node. But why the task manager loads the conf file from the > container_eXXX_01 path (which is located on job manager node)? > _2021-01-19 04:19:11,405 INFO org.apache.flink.yarn.YarnResourceManager [] - > Registering TaskManager with ResourceID > container_e283_1609125504851_3620_01_02 > (akka.tcp://fl...@rphf1hsn026.qa.webex.com:46785/user/rpc/taskmanager_0) at > ResourceManager 2021-01-19 04:19:11,506 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph [] - CHAIN DataSource > (at main(WordCount.java:69) (org.apache.flink.api.java.io.TextInputFormat)) > -> FlatMap (FlatMap at main(WordCount.java:84)) -> Combine (SUM(1), at > main(WordCount.java:87) (1/1) (10cf614584617de770c6fd0f0aad4db7) switched > from SCHEDULED to DEPLOYING. 2021-01-19 04:19:11,507 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Deploying CHAIN > DataSource (at main(WordCount.java:69) > (org.apache.flink.api.java.io.TextInputFormat)) -> FlatMap (FlatMap at > main(WordCount.java:84)) -> Combine (SUM(1), at main(WordCount.java:87) (1/1) > (attempt #0) to container_e283_1609125504851_3620_01_02 @ > rphf1hsn026.qa.webex.com (dataPort=46647) 2021-01-19 04:19:11,608 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph [] - CHAIN DataSource > (at main(WordCount.java:69) (org.apache.flink.api.java.io.TextInputFormat)) > -> FlatMap (FlatMap at main(WordCount.java:84)) -> Combine (SUM(1), at > main(WordCount.java:87) (1/1) (10cf614584617de770c6fd0f0aad4db7) switched > from DEPLOYING to RUNNING. 2021-01-19 04:19:11,792 INFO > org.apache.flink.api.common.io.LocatableInputSplitAssigner [] - Assigning > remote split to host rphf1hsn026 2021-01-19 04:19:11,847 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph [] - CHAIN DataSource > (at main(WordCount.java:69) (org.apache.flink.api.java.io.TextInputFormat)) > -> FlatMap (FlatMap at main(WordCount.java:84)) -> Combine (SUM(1), at > main(WordCount.java:87) (1/1) (10cf614584617de770c6fd0f0aad4db7) switched > from RUNNING to FAILED on > org.apache.flink.runtime.jobmaster.slotpool.SingleLogicalSlot@3e19cc76. > java.io.IOException: Error opening the Input Split > [file:/data/d1/yarn/nm/usercache/yanta/appcache/application_1609125504851_3620/container_e283_1609125504851_3620_01_01/conf/cmp_online.cfg|file:///data/d1/yarn/nm/usercache/yanta/appcache/application_1609125504851_3620/container_e283_1609125504851_3620_01_01/conf/cmp_online.cfg] > [0,71]: > /data/d1/yarn/nm/usercache/yanta/appcache/application_1609125504851_3620/container_e283_1609125504851_3620_01_01/conf/cmp_online.cfg > (No such file or directory) at > org.apache.flink.api.common.io.FileInputFormat.open(FileInputFormat.java:824) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] at > org.apache.flink.api.common.io.DelimitedInputFormat.open(DelimitedInputFormat.java:470) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] at > org.apache.flink.api.common.io.DelimitedInputFormat.open(Delimite
[jira] [Created] (FLINK-21023) Task Manager uses the container dir of Job Manager when running flink job on yarn-cluster.
Tang Yan created FLINK-21023: Summary: Task Manager uses the container dir of Job Manager when running flink job on yarn-cluster. Key: FLINK-21023 URL: https://issues.apache.org/jira/browse/FLINK-21023 Project: Flink Issue Type: Bug Components: Client / Job Submission Affects Versions: 1.11.1, 1.12.0 Reporter: Tang Yan I want to try to use option -yt(yarnship) to distribute my config files to Yarn cluster, and read the file in code. I just used the flink example wordcount. Here is my submit command: /opt/Flink/bin/flink run -m yarn-cluster -p 1 -yt /path/to/conf -c org.apache.flink.examples.java.wordcount.WordCount /opt/Flink/examples/batch/WordCount.jar --input conf/cmp_online.cfg Test Result: I found the if the job manager and task manager are lunched on the same node, the job can run successfully. But when they're running on different node, the job will fail in the below ERRORs. I find the conf folder has been distributed to container cache dirs, such as file:/data/d1/yarn/nm/usercache/yanta/appcache/application_1609125504851_3620/container_e283_1609125504851_3620_01_01/conf on job manager node, and file:/data/d1/yarn/nm/usercache/yanta/appcache/application_1609125504851_3620/container_e283_1609125504851_3620_01_02/conf on task manager node. But why the task manager loads the conf file from the container_eXXX_01 path (which is located on job manager node)? _2021-01-19 04:19:11,405 INFO org.apache.flink.yarn.YarnResourceManager [] - Registering TaskManager with ResourceID container_e283_1609125504851_3620_01_02 (akka.tcp://fl...@rphf1hsn026.qa.webex.com:46785/user/rpc/taskmanager_0) at ResourceManager 2021-01-19 04:19:11,506 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - CHAIN DataSource (at main(WordCount.java:69) (org.apache.flink.api.java.io.TextInputFormat)) -> FlatMap (FlatMap at main(WordCount.java:84)) -> Combine (SUM(1), at main(WordCount.java:87) (1/1) (10cf614584617de770c6fd0f0aad4db7) switched from SCHEDULED to DEPLOYING. 2021-01-19 04:19:11,507 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Deploying CHAIN DataSource (at main(WordCount.java:69) (org.apache.flink.api.java.io.TextInputFormat)) -> FlatMap (FlatMap at main(WordCount.java:84)) -> Combine (SUM(1), at main(WordCount.java:87) (1/1) (attempt #0) to container_e283_1609125504851_3620_01_02 @ rphf1hsn026.qa.webex.com (dataPort=46647) 2021-01-19 04:19:11,608 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - CHAIN DataSource (at main(WordCount.java:69) (org.apache.flink.api.java.io.TextInputFormat)) -> FlatMap (FlatMap at main(WordCount.java:84)) -> Combine (SUM(1), at main(WordCount.java:87) (1/1) (10cf614584617de770c6fd0f0aad4db7) switched from DEPLOYING to RUNNING. 2021-01-19 04:19:11,792 INFO org.apache.flink.api.common.io.LocatableInputSplitAssigner [] - Assigning remote split to host rphf1hsn026 2021-01-19 04:19:11,847 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - CHAIN DataSource (at main(WordCount.java:69) (org.apache.flink.api.java.io.TextInputFormat)) -> FlatMap (FlatMap at main(WordCount.java:84)) -> Combine (SUM(1), at main(WordCount.java:87) (1/1) (10cf614584617de770c6fd0f0aad4db7) switched from RUNNING to FAILED on org.apache.flink.runtime.jobmaster.slotpool.SingleLogicalSlot@3e19cc76. java.io.IOException: Error opening the Input Split file:/data/d1/yarn/nm/usercache/yanta/appcache/application_1609125504851_3620/container_e283_1609125504851_3620_01_01/conf/cmp_online.cfg [0,71]: /data/d1/yarn/nm/usercache/yanta/appcache/application_1609125504851_3620/container_e283_1609125504851_3620_01_01/conf/cmp_online.cfg (No such file or directory) at org.apache.flink.api.common.io.FileInputFormat.open(FileInputFormat.java:824) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.api.common.io.DelimitedInputFormat.open(DelimitedInputFormat.java:470) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.api.common.io.DelimitedInputFormat.open(DelimitedInputFormat.java:47) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:173) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:721) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.taskmanager.Task.run(Task.java:546) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_272] Caused by: java.io.FileNotFoundException: /data/d1/yarn/nm/usercache/yanta/appcache/application_1609125504851_3620/container_e283_1609125504851_3620_01_01/conf/cmp_online.cfg (No such file or directory) at java.io.FileInputStream.open0(Native Method) ~[?:1.8.0_272] at java.io.FileInputStream.open(FileInputStream.java:195) ~[?:1.8.0_272] at java.io.FileInpu
[GitHub] [flink] flinkbot edited a comment on pull request #10847: [FLINK-15578][connectors/jdbc] implement exactly once JDBC sink
flinkbot edited a comment on pull request #10847: URL: https://github.com/apache/flink/pull/10847#issuecomment-573933799 ## CI report: * 88b27b180a90899b60afbfd12c2930939f76a3a4 UNKNOWN * 51794ed8d24485bee01645ee15e7ecc1b8d13f8f Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12203) * 6f27572110fbc02a1f0ad65260212b621b928dfe Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12219) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-20495) Elasticsearch6DynamicSinkITCase Hang
[ https://issues.apache.org/jira/browse/FLINK-20495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Wysakowicz updated FLINK-20495: - Priority: Blocker (was: Major) > Elasticsearch6DynamicSinkITCase Hang > > > Key: FLINK-20495 > URL: https://issues.apache.org/jira/browse/FLINK-20495 > Project: Flink > Issue Type: Bug > Components: Connectors / ElasticSearch >Affects Versions: 1.13.0 >Reporter: Huang Xingbo >Priority: Blocker > Labels: test-stability > > [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=10535&view=logs&j=d44f43ce-542c-597d-bf94-b0718c71e5e8&t=03dca39c-73e8-5aaf-601d-328ae5c35f20] > > {code:java} > 2020-12-04T22:39:33.9748225Z [INFO] Running > org.apache.flink.streaming.connectors.elasticsearch.table.Elasticsearch6DynamicSinkITCase > 2020-12-04T22:54:51.9486410Z > == > 2020-12-04T22:54:51.9488766Z Process produced no output for 900 seconds. > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-20811) Support HTTP paths for yarn ship files/archives
[ https://issues.apache.org/jira/browse/FLINK-20811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17267746#comment-17267746 ] Xintong Song commented on FLINK-20811: -- Thanks you both for offering to contribute, [~nicholasjiang] and [~sanath.mv]. [~nicholasjiang], I noticed that you are currently assigned to 16 open issues in the Flink project, and 9 issues in other projects. I would kindly suggest to resolve some of these issues before taking more tickets. So if you agree, I'd like to assign this ticket to [~sanath.mv]. > Support HTTP paths for yarn ship files/archives > --- > > Key: FLINK-20811 > URL: https://issues.apache.org/jira/browse/FLINK-20811 > Project: Flink > Issue Type: New Feature > Components: Deployment / YARN >Reporter: Xintong Song >Priority: Major > > Flink's Yarn integration supports shipping workload-specific local > files/directories/archives to the Yarn cluster. > As discussed in FLINK-20505, it would be helpful to support directly > downloading contents from HTTP paths to the Yarn cluster, so that users won't > need to first download the contents locally and then upload it to the Yarn > cluster. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-21024) Dynamic properties get exposed to job's main method if user parameters are passed
Matthias created FLINK-21024: Summary: Dynamic properties get exposed to job's main method if user parameters are passed Key: FLINK-21024 URL: https://issues.apache.org/jira/browse/FLINK-21024 Project: Flink Issue Type: Bug Reporter: Matthias A bug was identified in the [user ML|http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-Application-cluster-standalone-job-some-JVM-Options-added-to-Program-Arguments-td40719.html] by Alexey exposing dynamic properties into the job user code. I was able to reproduce this issue by slightly adapting the WordCount example (see attached WordCount2.jar). Initiating a standalone job without using the `--input` parameter will result in printing an empty array: ``` ./bin/standalone-job.sh start --job-classname org.apache.flink.streaming.examples.wordcount.WordCount2 ``` The corresponding `*.out` file looks like this: ``` [] Executing WordCount2 example with default input data set. Use --input to specify file input. Printing result to stdout. Use --output to specify output path. ``` In contrast, initiating the standalone job using the `--input` parameter will expose the dynamic properties: ``` ./bin/standalone-job.sh start --job-classname org.apache.flink.streaming.examples.wordcount.WordCount2 --input /opt/flink/config/flink-conf.yaml ``` Resulting in the following output: ``` [--input, /opt/flink/config/flink-conf.yaml, -D, jobmanager.memory.off-heap.size=134217728b, -D, jobmanager.memory.jvm-overhead.min=201326592b, -D, jobmanager.memory.jvm-metaspace.size=268435456b, -D, jobmanager.memory.heap.size=1073741824b, -D, jobmanager.memory.jvm-overhead.max=201326592b] Printing result to stdout. Use --output to specify output path. ``` Interestingly, this cannot be reproduced on a local standalone session cluster. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-21024) Dynamic properties get exposed to job's main method if user parameters are passed
[ https://issues.apache.org/jira/browse/FLINK-21024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias updated FLINK-21024: - Description: A bug was identified in the [user ML|http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-Application-cluster-standalone-job-some-JVM-Options-added-to-Program-Arguments-td40719.html] by Alexey exposing dynamic properties into the job user code. I was able to reproduce this issue by slightly adapting the WordCount example (see attached WordCount2.jar). Initiating a standalone job without using the `--input` parameter will result in printing an empty array: {code} ./bin/standalone-job.sh start --job-classname org.apache.flink.streaming.examples.wordcount.WordCount2 {code} The corresponding `*.out` file looks like this: {code} [] Executing WordCount2 example with default input data set. Use --input to specify file input. Printing result to stdout. Use --output to specify output path. {code} In contrast, initiating the standalone job using the `--input` parameter will expose the dynamic properties: {code} ./bin/standalone-job.sh start --job-classname org.apache.flink.streaming.examples.wordcount.WordCount2 --input /opt/flink/config/flink-conf.yaml {code} Resulting in the following output: {code} [--input, /opt/flink/config/flink-conf.yaml, -D, jobmanager.memory.off-heap.size=134217728b, -D, jobmanager.memory.jvm-overhead.min=201326592b, -D, jobmanager.memory.jvm-metaspace.size=268435456b, -D, jobmanager.memory.heap.size=1073741824b, -D, jobmanager.memory.jvm-overhead.max=201326592b] Printing result to stdout. Use --output to specify output path. {code} Interestingly, this cannot be reproduced on a local standalone session cluster. was: A bug was identified in the [user ML|http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-Application-cluster-standalone-job-some-JVM-Options-added-to-Program-Arguments-td40719.html] by Alexey exposing dynamic properties into the job user code. I was able to reproduce this issue by slightly adapting the WordCount example (see attached WordCount2.jar). Initiating a standalone job without using the `--input` parameter will result in printing an empty array: ``` ./bin/standalone-job.sh start --job-classname org.apache.flink.streaming.examples.wordcount.WordCount2 ``` The corresponding `*.out` file looks like this: ``` [] Executing WordCount2 example with default input data set. Use --input to specify file input. Printing result to stdout. Use --output to specify output path. ``` In contrast, initiating the standalone job using the `--input` parameter will expose the dynamic properties: ``` ./bin/standalone-job.sh start --job-classname org.apache.flink.streaming.examples.wordcount.WordCount2 --input /opt/flink/config/flink-conf.yaml ``` Resulting in the following output: ``` [--input, /opt/flink/config/flink-conf.yaml, -D, jobmanager.memory.off-heap.size=134217728b, -D, jobmanager.memory.jvm-overhead.min=201326592b, -D, jobmanager.memory.jvm-metaspace.size=268435456b, -D, jobmanager.memory.heap.size=1073741824b, -D, jobmanager.memory.jvm-overhead.max=201326592b] Printing result to stdout. Use --output to specify output path. ``` Interestingly, this cannot be reproduced on a local standalone session cluster. > Dynamic properties get exposed to job's main method if user parameters are > passed > - > > Key: FLINK-21024 > URL: https://issues.apache.org/jira/browse/FLINK-21024 > Project: Flink > Issue Type: Bug >Reporter: Matthias >Priority: Major > Labels: starter > > A bug was identified in the [user > ML|http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-Application-cluster-standalone-job-some-JVM-Options-added-to-Program-Arguments-td40719.html] > by Alexey exposing dynamic properties into the job user code. > I was able to reproduce this issue by slightly adapting the WordCount example > (see attached WordCount2.jar). > Initiating a standalone job without using the `--input` parameter will result > in printing an empty array: > {code} > ./bin/standalone-job.sh start --job-classname > org.apache.flink.streaming.examples.wordcount.WordCount2 > {code} > The corresponding `*.out` file looks like this: > {code} > [] > Executing WordCount2 example with default input data set. > Use --input to specify file input. > Printing result to stdout. Use --output to specify output path. > {code} > In contrast, initiating the standalone job using the `--input` parameter will > expose the dynamic properties: > {code} > ./bin/standalone-job.sh start --job-classname > org.apache.flink.streaming.examples.wordcount.WordCount2 --input > /opt/flink/config/flink-conf.yaml > {code} > Resulting in the following output: > {code}
[jira] [Created] (FLINK-21025) SQLClientHBaseITCase fails when untarring HBase
Dawid Wysakowicz created FLINK-21025: Summary: SQLClientHBaseITCase fails when untarring HBase Key: FLINK-21025 URL: https://issues.apache.org/jira/browse/FLINK-21025 Project: Flink Issue Type: Bug Components: Connectors / HBase, Table SQL / Client, Tests Affects Versions: 1.13.0 Reporter: Dawid Wysakowicz Fix For: 1.13.0 https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=12210&view=logs&j=c88eea3b-64a0-564d-0031-9fdcd7b8abee&t=ff888d9b-cd34-53cc-d90f-3e446d355529 {code} [ERROR] Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 908.614 s <<< FAILURE! - in org.apache.flink.tests.util.hbase.SQLClientHBaseITCase Jan 19 08:19:36 [ERROR] testHBase[1: hbase-version:2.2.3](org.apache.flink.tests.util.hbase.SQLClientHBaseITCase) Time elapsed: 615.099 s <<< ERROR! Jan 19 08:19:36 java.io.IOException: Jan 19 08:19:36 Process execution failed due error. Error output: Jan 19 08:19:36 at org.apache.flink.tests.util.AutoClosableProcess$AutoClosableProcessBuilder.runBlocking(AutoClosableProcess.java:133) Jan 19 08:19:36 at org.apache.flink.tests.util.AutoClosableProcess$AutoClosableProcessBuilder.runBlocking(AutoClosableProcess.java:108) Jan 19 08:19:36 at org.apache.flink.tests.util.AutoClosableProcess.runBlocking(AutoClosableProcess.java:70) Jan 19 08:19:36 at org.apache.flink.tests.util.hbase.LocalStandaloneHBaseResource.setupHBaseDist(LocalStandaloneHBaseResource.java:86) Jan 19 08:19:36 at org.apache.flink.tests.util.hbase.LocalStandaloneHBaseResource.before(LocalStandaloneHBaseResource.java:76) Jan 19 08:19:36 at org.apache.flink.util.ExternalResource$1.evaluate(ExternalResource.java:46) Jan 19 08:19:36 at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) Jan 19 08:19:36 at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) Jan 19 08:19:36 at org.junit.rules.RunRules.evaluate(RunRules.java:20) Jan 19 08:19:36 at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) Jan 19 08:19:36 at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) Jan 19 08:19:36 at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) Jan 19 08:19:36 at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) Jan 19 08:19:36 at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) Jan 19 08:19:36 at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) Jan 19 08:19:36 at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) Jan 19 08:19:36 at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) Jan 19 08:19:36 at org.junit.runners.ParentRunner.run(ParentRunner.java:363) Jan 19 08:19:36 at org.junit.runners.Suite.runChild(Suite.java:128) Jan 19 08:19:36 at org.junit.runners.Suite.runChild(Suite.java:27) Jan 19 08:19:36 at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) Jan 19 08:19:36 at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) Jan 19 08:19:36 at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) Jan 19 08:19:36 at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) Jan 19 08:19:36 at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) Jan 19 08:19:36 at org.apache.flink.util.ExternalResource$1.evaluate(ExternalResource.java:48) Jan 19 08:19:36 at org.junit.rules.RunRules.evaluate(RunRules.java:20) Jan 19 08:19:36 at org.junit.runners.ParentRunner.run(ParentRunner.java:363) Jan 19 08:19:36 at org.junit.runners.Suite.runChild(Suite.java:128) Jan 19 08:19:36 at org.junit.runners.Suite.runChild(Suite.java:27) Jan 19 08:19:36 at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) Jan 19 08:19:36 at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) Jan 19 08:19:36 at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) Jan 19 08:19:36 at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) Jan 19 08:19:36 at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) Jan 19 08:19:36 at org.junit.runners.ParentRunner.run(ParentRunner.java:363) Jan 19 08:19:36 at org.apache.maven.surefire.junitcore.JUnitCore.run(JUnitCore.java:55) Jan 19 08:19:36 at org.apache.maven.surefire.junitcore.JUnitCoreWrapper.createRequestAndRun(JUnitCoreWrapper.java:137) Jan 19 08:19:36 at org.apache.maven.surefire.junitcore.JUnitCoreWrapper.executeEager(JUnitCoreWrapper.java:107) Jan 19 08:19:36 at org.apache.maven.surefire.junitcore.JUnitCoreWrapper.execute(JUnitCoreWrappe
[GitHub] [flink] rmetzger opened a new pull request #14694: [FLINK-19158][e2e] Fix wget timeout mechanism and cache config
rmetzger opened a new pull request #14694: URL: https://github.com/apache/flink/pull/14694 ## What is the purpose of the change Increase stability of java e2e tests This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-21024) Dynamic properties get exposed to job's main method if user parameters are passed
[ https://issues.apache.org/jira/browse/FLINK-21024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias updated FLINK-21024: - Attachment: WordCount.jar > Dynamic properties get exposed to job's main method if user parameters are > passed > - > > Key: FLINK-21024 > URL: https://issues.apache.org/jira/browse/FLINK-21024 > Project: Flink > Issue Type: Bug >Reporter: Matthias >Priority: Major > Labels: starter > Attachments: WordCount.jar > > > A bug was identified in the [user > ML|http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-Application-cluster-standalone-job-some-JVM-Options-added-to-Program-Arguments-td40719.html] > by Alexey exposing dynamic properties into the job user code. > I was able to reproduce this issue by slightly adapting the WordCount example > (see attached WordCount2.jar). > Initiating a standalone job without using the `--input` parameter will result > in printing an empty array: > {code} > ./bin/standalone-job.sh start --job-classname > org.apache.flink.streaming.examples.wordcount.WordCount2 > {code} > The corresponding `*.out` file looks like this: > {code} > [] > Executing WordCount2 example with default input data set. > Use --input to specify file input. > Printing result to stdout. Use --output to specify output path. > {code} > In contrast, initiating the standalone job using the `--input` parameter will > expose the dynamic properties: > {code} > ./bin/standalone-job.sh start --job-classname > org.apache.flink.streaming.examples.wordcount.WordCount2 --input > /opt/flink/config/flink-conf.yaml > {code} > Resulting in the following output: > {code} > [--input, /opt/flink/config/flink-conf.yaml, -D, > jobmanager.memory.off-heap.size=134217728b, -D, > jobmanager.memory.jvm-overhead.min=201326592b, -D, > jobmanager.memory.jvm-metaspace.size=268435456b, -D, > jobmanager.memory.heap.size=1073741824b, -D, > jobmanager.memory.jvm-overhead.max=201326592b] > Printing result to stdout. Use --output to specify output path. > {code} > Interestingly, this cannot be reproduced on a local standalone session > cluster. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-21025) SQLClientHBaseITCase fails when untarring HBase
[ https://issues.apache.org/jira/browse/FLINK-21025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17267755#comment-17267755 ] Dawid Wysakowicz commented on FLINK-21025: -- Could be related to https://issues.apache.org/jira/browse/FLINK-19158 > SQLClientHBaseITCase fails when untarring HBase > --- > > Key: FLINK-21025 > URL: https://issues.apache.org/jira/browse/FLINK-21025 > Project: Flink > Issue Type: Bug > Components: Connectors / HBase, Table SQL / Client, Tests >Affects Versions: 1.13.0 >Reporter: Dawid Wysakowicz >Priority: Critical > Labels: test-stability > Fix For: 1.13.0 > > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=12210&view=logs&j=c88eea3b-64a0-564d-0031-9fdcd7b8abee&t=ff888d9b-cd34-53cc-d90f-3e446d355529 > {code} > [ERROR] Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: > 908.614 s <<< FAILURE! - in > org.apache.flink.tests.util.hbase.SQLClientHBaseITCase > Jan 19 08:19:36 [ERROR] testHBase[1: > hbase-version:2.2.3](org.apache.flink.tests.util.hbase.SQLClientHBaseITCase) > Time elapsed: 615.099 s <<< ERROR! > Jan 19 08:19:36 java.io.IOException: > Jan 19 08:19:36 Process execution failed due error. Error output: > Jan 19 08:19:36 at > org.apache.flink.tests.util.AutoClosableProcess$AutoClosableProcessBuilder.runBlocking(AutoClosableProcess.java:133) > Jan 19 08:19:36 at > org.apache.flink.tests.util.AutoClosableProcess$AutoClosableProcessBuilder.runBlocking(AutoClosableProcess.java:108) > Jan 19 08:19:36 at > org.apache.flink.tests.util.AutoClosableProcess.runBlocking(AutoClosableProcess.java:70) > Jan 19 08:19:36 at > org.apache.flink.tests.util.hbase.LocalStandaloneHBaseResource.setupHBaseDist(LocalStandaloneHBaseResource.java:86) > Jan 19 08:19:36 at > org.apache.flink.tests.util.hbase.LocalStandaloneHBaseResource.before(LocalStandaloneHBaseResource.java:76) > Jan 19 08:19:36 at > org.apache.flink.util.ExternalResource$1.evaluate(ExternalResource.java:46) > Jan 19 08:19:36 at > org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) > Jan 19 08:19:36 at > org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) > Jan 19 08:19:36 at org.junit.rules.RunRules.evaluate(RunRules.java:20) > Jan 19 08:19:36 at > org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > Jan 19 08:19:36 at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > Jan 19 08:19:36 at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > Jan 19 08:19:36 at > org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > Jan 19 08:19:36 at > org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > Jan 19 08:19:36 at > org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > Jan 19 08:19:36 at > org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > Jan 19 08:19:36 at > org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > Jan 19 08:19:36 at > org.junit.runners.ParentRunner.run(ParentRunner.java:363) > Jan 19 08:19:36 at org.junit.runners.Suite.runChild(Suite.java:128) > Jan 19 08:19:36 at org.junit.runners.Suite.runChild(Suite.java:27) > Jan 19 08:19:36 at > org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > Jan 19 08:19:36 at > org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > Jan 19 08:19:36 at > org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > Jan 19 08:19:36 at > org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > Jan 19 08:19:36 at > org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > Jan 19 08:19:36 at > org.apache.flink.util.ExternalResource$1.evaluate(ExternalResource.java:48) > Jan 19 08:19:36 at org.junit.rules.RunRules.evaluate(RunRules.java:20) > Jan 19 08:19:36 at > org.junit.runners.ParentRunner.run(ParentRunner.java:363) > Jan 19 08:19:36 at org.junit.runners.Suite.runChild(Suite.java:128) > Jan 19 08:19:36 at org.junit.runners.Suite.runChild(Suite.java:27) > Jan 19 08:19:36 at > org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > Jan 19 08:19:36 at > org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > Jan 19 08:19:36 at > org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > Jan 19 08:19:36 at > org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > Jan 19 08:19:36 at > org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > Jan 19 08:19:36 at > org.junit.runners.ParentRunner.run(ParentRunner.java:363) > J
[GitHub] [flink] flinkbot commented on pull request #14694: [FLINK-19158][e2e] Fix wget timeout mechanism and cache config
flinkbot commented on pull request #14694: URL: https://github.com/apache/flink/pull/14694#issuecomment-762693087 Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community to review your pull request. We will use this comment to track the progress of the review. ## Automated Checks Last check on commit 1c8e22e046a15d442e4d0bcdec73c6ea5b4a3f2d (Tue Jan 19 08:46:11 UTC 2021) **Warnings:** * No documentation files were touched! Remember to keep the Flink docs up to date! Mention the bot in a comment to re-run the automated checks. ## Review Progress * ❓ 1. The [description] looks good. * ❓ 2. There is [consensus] that the contribution should go into to Flink. * ❓ 3. Needs [attention] from. * ❓ 4. The change fits into the overall [architecture]. * ❓ 5. Overall code [quality] is good. Please see the [Pull Request Review Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands The @flinkbot bot supports the following commands: - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`) - `@flinkbot approve all` to approve all aspects - `@flinkbot approve-until architecture` to approve everything until `architecture` - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention - `@flinkbot disapprove architecture` to remove an approval you gave earlier This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-19132) Failed to start jobs for consuming Secure Kafka after cluster restart
[ https://issues.apache.org/jira/browse/FLINK-19132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17267757#comment-17267757 ] Suo L. commented on FLINK-19132: See the same issue, 1.10.2 + k8s, temporary solved by recreate a cluster. {noformat} Caused by: java.lang.NoClassDefFoundError: org/apache/kafka/common/security/scram/internals/ScramSaslClient at org.apache.kafka.common.security.scram.internals.ScramSaslClient$ScramSaslClientFactory.createSaslClient(ScramSaslClient.java:235) at javax.security.sasl.Sasl.createSaslClient(Sasl.java:384) at org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.lambda$createSaslClient$0(SaslClientAuthenticator.java:180) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.createSaslClient(SaslClientAuthenticator.java:176) at org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.(SaslClientAuthenticator.java:168) at org.apache.kafka.common.network.SaslChannelBuilder.buildClientAuthenticator(SaslChannelBuilder.java:254) at org.apache.kafka.common.network.SaslChannelBuilder.lambda$buildChannel$1(SaslChannelBuilder.java:202) at org.apache.kafka.common.network.KafkaChannel.(KafkaChannel.java:140) at org.apache.kafka.common.network.SaslChannelBuilder.buildChannel(SaslChannelBuilder.java:210) at org.apache.kafka.common.network.Selector.buildAndAttachKafkaChannel(Selector.java:334) at org.apache.kafka.common.network.Selector.registerChannel(Selector.java:325) at org.apache.kafka.common.network.Selector.connect(Selector.java:257) at org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:920) at org.apache.kafka.clients.NetworkClient.ready(NetworkClient.java:287) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.trySend(ConsumerNetworkClient.java:474) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:255) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:236) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:215) at org.apache.kafka.clients.consumer.internals.Fetcher.getTopicMetadata(Fetcher.java:292) at org.apache.kafka.clients.consumer.KafkaConsumer.partitionsFor(KafkaConsumer.java:1803) at org.apache.kafka.clients.consumer.KafkaConsumer.partitionsFor(KafkaConsumer.java:1771) at org.apache.flink.streaming.connectors.kafka.internal.KafkaPartitionDiscoverer.getAllPartitionsForTopics(KafkaPartitionDiscoverer.java:77) at org.apache.flink.streaming.connectors.kafka.internals.AbstractPartitionDiscoverer.discoverPartitions(AbstractPartitionDiscoverer.java:131) at org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.open(FlinkKafkaConsumerBase.java:511) at org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:36) at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.open(AbstractUdfStreamOperator.java:102) at org.apache.flink.streaming.runtime.tasks.StreamTask.initializeStateAndOpen(StreamTask.java:1019) at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$beforeInvoke$0(StreamTask.java:463) at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.runThrowing(StreamTaskActionExecutor.java:94) at org.apache.flink.streaming.runtime.tasks.StreamTask.beforeInvoke(StreamTask.java:458) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:470) at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:778) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:585) at java.lang.Thread.run(Thread.java:748) {noformat} > Failed to start jobs for consuming Secure Kafka after cluster restart > - > > Key: FLINK-19132 > URL: https://issues.apache.org/jira/browse/FLINK-19132 > Project: Flink > Issue Type: Bug > Components: Connectors / Kafka >Affects Versions: 1.9.1, 1.10.1 >Reporter: Olivier Zembri >Priority: Major > > We deploy Flink jobs packaged as fat jar files compiled with Java 1.8 on a > Flink session cluster in Kubernetes. > After restarting the Kubernetes cluster, the jobs fail to start and we get > several NoClassDefFoundError in the Task Manager log. > *Stack trace* > {color:#7a869a}{color} > {code:java} > java.lang.NoClassDefFoundE
[jira] [Commented] (FLINK-20811) Support HTTP paths for yarn ship files/archives
[ https://issues.apache.org/jira/browse/FLINK-20811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17267759#comment-17267759 ] Nicholas Jiang commented on FLINK-20811: [~xintongsong], I would follow your suggestion. Please help to assign to [~sanath.mv]. > Support HTTP paths for yarn ship files/archives > --- > > Key: FLINK-20811 > URL: https://issues.apache.org/jira/browse/FLINK-20811 > Project: Flink > Issue Type: New Feature > Components: Deployment / YARN >Reporter: Xintong Song >Priority: Major > > Flink's Yarn integration supports shipping workload-specific local > files/directories/archives to the Yarn cluster. > As discussed in FLINK-20505, it would be helpful to support directly > downloading contents from HTTP paths to the Yarn cluster, so that users won't > need to first download the contents locally and then upload it to the Yarn > cluster. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (FLINK-20811) Support HTTP paths for yarn ship files/archives
[ https://issues.apache.org/jira/browse/FLINK-20811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song reassigned FLINK-20811: Assignee: Venkata S Muppalla > Support HTTP paths for yarn ship files/archives > --- > > Key: FLINK-20811 > URL: https://issues.apache.org/jira/browse/FLINK-20811 > Project: Flink > Issue Type: New Feature > Components: Deployment / YARN >Reporter: Xintong Song >Assignee: Venkata S Muppalla >Priority: Major > > Flink's Yarn integration supports shipping workload-specific local > files/directories/archives to the Yarn cluster. > As discussed in FLINK-20505, it would be helpful to support directly > downloading contents from HTTP paths to the Yarn cluster, so that users won't > need to first download the contents locally and then upload it to the Yarn > cluster. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-20811) Support HTTP paths for yarn ship files/archives
[ https://issues.apache.org/jira/browse/FLINK-20811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17267762#comment-17267762 ] Xintong Song commented on FLINK-20811: -- [~nicholasjiang], thanks for your understanding. [~sanath.mv], I've assigned you to this ticket. Please make sure to read the following guidelines and go ahead. https://flink.apache.org/contributing/contribute-code.html https://flink.apache.org/contributing/code-style-and-quality-preamble.html > Support HTTP paths for yarn ship files/archives > --- > > Key: FLINK-20811 > URL: https://issues.apache.org/jira/browse/FLINK-20811 > Project: Flink > Issue Type: New Feature > Components: Deployment / YARN >Reporter: Xintong Song >Assignee: Venkata S Muppalla >Priority: Major > > Flink's Yarn integration supports shipping workload-specific local > files/directories/archives to the Yarn cluster. > As discussed in FLINK-20505, it would be helpful to support directly > downloading contents from HTTP paths to the Yarn cluster, so that users won't > need to first download the contents locally and then upload it to the Yarn > cluster. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-21026) Align column list specification with Hive in INSERT statement
Zhenghua Gao created FLINK-21026: Summary: Align column list specification with Hive in INSERT statement Key: FLINK-21026 URL: https://issues.apache.org/jira/browse/FLINK-21026 Project: Flink Issue Type: Bug Components: Table SQL / API Reporter: Zhenghua Gao [HIVE-9481|https://issues.apache.org/jira/browse/HIVE-9481] allows column list specification in INSERT statement. The syntax is: {code:java} INSERT INTO TABLE table_name [PARTITION (partcol1[=val1], partcol2[=val2] ...)] [(column list)] select_statement FROM from_statement {code} In the MeanWhile, flink introduces PARTITION syntax that the PARTITION clause appears after the COLUMN LIST clause. It looks weird and luckily we don't support COLUMN LIST clause now[FLINK-18726|https://issues.apache.org/jira/browse/FLINK-18726]. I think it'a good change to align this with Hive now. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] flinkbot edited a comment on pull request #14689: [FLINK-20500][upsert-kafka] Fix temporal join test
flinkbot edited a comment on pull request #14689: URL: https://github.com/apache/flink/pull/14689#issuecomment-762595621 ## CI report: * 7421043232e94c67d4e44020b8a5a319a3e95a09 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12213) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot commented on pull request #14694: [FLINK-19158][e2e] Fix wget timeout mechanism and cache config
flinkbot commented on pull request #14694: URL: https://github.com/apache/flink/pull/14694#issuecomment-762699714 ## CI report: * 1c8e22e046a15d442e4d0bcdec73c6ea5b4a3f2d UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-21024) Dynamic properties get exposed to job's main method if user parameters are passed
[ https://issues.apache.org/jira/browse/FLINK-21024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias updated FLINK-21024: - Affects Version/s: 1.12.1 > Dynamic properties get exposed to job's main method if user parameters are > passed > - > > Key: FLINK-21024 > URL: https://issues.apache.org/jira/browse/FLINK-21024 > Project: Flink > Issue Type: Bug >Affects Versions: 1.12.1 >Reporter: Matthias >Priority: Major > Labels: starter > Attachments: WordCount.jar > > > A bug was identified in the [user > ML|http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-Application-cluster-standalone-job-some-JVM-Options-added-to-Program-Arguments-td40719.html] > by Alexey exposing dynamic properties into the job user code. > I was able to reproduce this issue by slightly adapting the WordCount example > (see attached WordCount2.jar). > Initiating a standalone job without using the `--input` parameter will result > in printing an empty array: > {code} > ./bin/standalone-job.sh start --job-classname > org.apache.flink.streaming.examples.wordcount.WordCount2 > {code} > The corresponding `*.out` file looks like this: > {code} > [] > Executing WordCount2 example with default input data set. > Use --input to specify file input. > Printing result to stdout. Use --output to specify output path. > {code} > In contrast, initiating the standalone job using the `--input` parameter will > expose the dynamic properties: > {code} > ./bin/standalone-job.sh start --job-classname > org.apache.flink.streaming.examples.wordcount.WordCount2 --input > /opt/flink/config/flink-conf.yaml > {code} > Resulting in the following output: > {code} > [--input, /opt/flink/config/flink-conf.yaml, -D, > jobmanager.memory.off-heap.size=134217728b, -D, > jobmanager.memory.jvm-overhead.min=201326592b, -D, > jobmanager.memory.jvm-metaspace.size=268435456b, -D, > jobmanager.memory.heap.size=1073741824b, -D, > jobmanager.memory.jvm-overhead.max=201326592b] > Printing result to stdout. Use --output to specify output path. > {code} > Interestingly, this cannot be reproduced on a local standalone session > cluster. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-21026) Align column list specification with Hive in INSERT statement
[ https://issues.apache.org/jira/browse/FLINK-21026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenghua Gao updated FLINK-21026: - Description: HIVE-9481 allows column list specification in INSERT statement. The syntax is: {code:java} INSERT INTO TABLE table_name [PARTITION (partcol1[=val1], partcol2[=val2] ...)] [(column list)] select_statement FROM from_statement {code} In the MeanWhile, flink introduces PARTITION syntax that the PARTITION clause appears after the COLUMN LIST clause. It looks weird and luckily we don't support COLUMN LIST clause nowFLINK-18726. I think it'a good chance to align this with Hive now. was: [HIVE-9481|https://issues.apache.org/jira/browse/HIVE-9481] allows column list specification in INSERT statement. The syntax is: {code:java} INSERT INTO TABLE table_name [PARTITION (partcol1[=val1], partcol2[=val2] ...)] [(column list)] select_statement FROM from_statement {code} In the MeanWhile, flink introduces PARTITION syntax that the PARTITION clause appears after the COLUMN LIST clause. It looks weird and luckily we don't support COLUMN LIST clause now[FLINK-18726|https://issues.apache.org/jira/browse/FLINK-18726]. I think it'a good change to align this with Hive now. > Align column list specification with Hive in INSERT statement > - > > Key: FLINK-21026 > URL: https://issues.apache.org/jira/browse/FLINK-21026 > Project: Flink > Issue Type: Bug > Components: Table SQL / API >Reporter: Zhenghua Gao >Priority: Major > > HIVE-9481 allows column list specification in INSERT statement. The syntax is: > {code:java} > INSERT INTO TABLE table_name > [PARTITION (partcol1[=val1], partcol2[=val2] ...)] > [(column list)] > select_statement FROM from_statement > {code} > In the MeanWhile, flink introduces PARTITION syntax that the PARTITION clause > appears after the COLUMN LIST clause. It looks weird and luckily we don't > support COLUMN LIST clause nowFLINK-18726. I think it'a good chance to align > this with Hive now. > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-21026) Align column list specification with Hive in INSERT statement
[ https://issues.apache.org/jira/browse/FLINK-21026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenghua Gao updated FLINK-21026: - Description: HIVE-9481 allows column list specification in INSERT statement. The syntax is: {code:java} INSERT INTO TABLE table_name [PARTITION (partcol1[=val1], partcol2[=val2] ...)] [(column list)] select_statement FROM from_statement {code} In the MeanWhile, flink introduces PARTITION syntax that the PARTITION clause appears after the COLUMN LIST clause. It looks weird and luckily we don't support COLUMN LIST clause now. I think it'a good chance to align this with Hive now. was: HIVE-9481 allows column list specification in INSERT statement. The syntax is: {code:java} INSERT INTO TABLE table_name [PARTITION (partcol1[=val1], partcol2[=val2] ...)] [(column list)] select_statement FROM from_statement {code} In the MeanWhile, flink introduces PARTITION syntax that the PARTITION clause appears after the COLUMN LIST clause. It looks weird and luckily we don't support COLUMN LIST clause nowFLINK-18726. I think it'a good chance to align this with Hive now. > Align column list specification with Hive in INSERT statement > - > > Key: FLINK-21026 > URL: https://issues.apache.org/jira/browse/FLINK-21026 > Project: Flink > Issue Type: Bug > Components: Table SQL / API >Reporter: Zhenghua Gao >Priority: Major > > HIVE-9481 allows column list specification in INSERT statement. The syntax is: > {code:java} > INSERT INTO TABLE table_name > [PARTITION (partcol1[=val1], partcol2[=val2] ...)] > [(column list)] > select_statement FROM from_statement > {code} > In the MeanWhile, flink introduces PARTITION syntax that the PARTITION clause > appears after the COLUMN LIST clause. It looks weird and luckily we don't > support COLUMN LIST clause now. I think it'a good chance to align this with > Hive now. > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-21024) Dynamic properties get exposed to job's main method if user parameters are passed
[ https://issues.apache.org/jira/browse/FLINK-21024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias updated FLINK-21024: - Description: A bug was identified in the [user ML|http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-Application-cluster-standalone-job-some-JVM-Options-added-to-Program-Arguments-td40719.html] by Alexey exposing dynamic properties into the job user code. I was able to reproduce this issue by slightly adapting the WordCount example ({{org.apache.flink.streaming.examples.wordcount.WordCount2}} in attached WordCount.jar). Initiating a standalone job without using the {{--input}} parameter will result in printing an empty array: {code} ./bin/standalone-job.sh start --job-classname org.apache.flink.streaming.examples.wordcount.WordCount2 {code} The corresponding {{*.out}} file looks like this: {code} [] Executing WordCount2 example with default input data set. Use --input to specify file input. Printing result to stdout. Use --output to specify output path. {code} In contrast, initiating the standalone job using the {{--input}} parameter will expose the dynamic properties: {code} ./bin/standalone-job.sh start --job-classname org.apache.flink.streaming.examples.wordcount.WordCount2 --input /opt/flink/config/flink-conf.yaml {code} Resulting in the following output: {code} [--input, /opt/flink/config/flink-conf.yaml, -D, jobmanager.memory.off-heap.size=134217728b, -D, jobmanager.memory.jvm-overhead.min=201326592b, -D, jobmanager.memory.jvm-metaspace.size=268435456b, -D, jobmanager.memory.heap.size=1073741824b, -D, jobmanager.memory.jvm-overhead.max=201326592b] Printing result to stdout. Use --output to specify output path. {code} Interestingly, this cannot be reproduced on a local standalone session cluster. was: A bug was identified in the [user ML|http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-Application-cluster-standalone-job-some-JVM-Options-added-to-Program-Arguments-td40719.html] by Alexey exposing dynamic properties into the job user code. I was able to reproduce this issue by slightly adapting the WordCount example (see attached WordCount2.jar). Initiating a standalone job without using the `--input` parameter will result in printing an empty array: {code} ./bin/standalone-job.sh start --job-classname org.apache.flink.streaming.examples.wordcount.WordCount2 {code} The corresponding `*.out` file looks like this: {code} [] Executing WordCount2 example with default input data set. Use --input to specify file input. Printing result to stdout. Use --output to specify output path. {code} In contrast, initiating the standalone job using the `--input` parameter will expose the dynamic properties: {code} ./bin/standalone-job.sh start --job-classname org.apache.flink.streaming.examples.wordcount.WordCount2 --input /opt/flink/config/flink-conf.yaml {code} Resulting in the following output: {code} [--input, /opt/flink/config/flink-conf.yaml, -D, jobmanager.memory.off-heap.size=134217728b, -D, jobmanager.memory.jvm-overhead.min=201326592b, -D, jobmanager.memory.jvm-metaspace.size=268435456b, -D, jobmanager.memory.heap.size=1073741824b, -D, jobmanager.memory.jvm-overhead.max=201326592b] Printing result to stdout. Use --output to specify output path. {code} Interestingly, this cannot be reproduced on a local standalone session cluster. > Dynamic properties get exposed to job's main method if user parameters are > passed > - > > Key: FLINK-21024 > URL: https://issues.apache.org/jira/browse/FLINK-21024 > Project: Flink > Issue Type: Bug > Components: Runtime / Configuration >Affects Versions: 1.12.1 >Reporter: Matthias >Priority: Major > Labels: starter > Attachments: WordCount.jar > > > A bug was identified in the [user > ML|http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-Application-cluster-standalone-job-some-JVM-Options-added-to-Program-Arguments-td40719.html] > by Alexey exposing dynamic properties into the job user code. > I was able to reproduce this issue by slightly adapting the WordCount example > ({{org.apache.flink.streaming.examples.wordcount.WordCount2}} in attached > WordCount.jar). > Initiating a standalone job without using the {{--input}} parameter will > result in printing an empty array: > {code} > ./bin/standalone-job.sh start --job-classname > org.apache.flink.streaming.examples.wordcount.WordCount2 > {code} > The corresponding {{*.out}} file looks like this: > {code} > [] > Executing WordCount2 example with default input data set. > Use --input to specify file input. > Printing result to stdout. Use --output to specify output path. > {code} > In contrast, initiating the standalone job using the {{--
[jira] [Updated] (FLINK-21024) Dynamic properties get exposed to job's main method if user parameters are passed
[ https://issues.apache.org/jira/browse/FLINK-21024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias updated FLINK-21024: - Description: A bug was identified in the [user ML|http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-Application-cluster-standalone-job-some-JVM-Options-added-to-Program-Arguments-td40719.html] by Alexey exposing dynamic properties into the job user code. I was able to reproduce this issue by slightly adapting the WordCount example ({{org.apache.flink.streaming.examples.wordcount.WordCount2}} in attached [^WordCount.jar] ). Initiating a standalone job without using the {{--input}} parameter will result in printing an empty array: {code} ./bin/standalone-job.sh start --job-classname org.apache.flink.streaming.examples.wordcount.WordCount2 {code} The corresponding {{*.out}} file looks like this: {code} [] Executing WordCount2 example with default input data set. Use --input to specify file input. Printing result to stdout. Use --output to specify output path. {code} In contrast, initiating the standalone job using the {{--input}} parameter will expose the dynamic properties: {code} ./bin/standalone-job.sh start --job-classname org.apache.flink.streaming.examples.wordcount.WordCount2 --input /opt/flink/config/flink-conf.yaml {code} Resulting in the following output: {code} [--input, /opt/flink/config/flink-conf.yaml, -D, jobmanager.memory.off-heap.size=134217728b, -D, jobmanager.memory.jvm-overhead.min=201326592b, -D, jobmanager.memory.jvm-metaspace.size=268435456b, -D, jobmanager.memory.heap.size=1073741824b, -D, jobmanager.memory.jvm-overhead.max=201326592b] Printing result to stdout. Use --output to specify output path. {code} Interestingly, this cannot be reproduced on a local standalone session cluster. was: A bug was identified in the [user ML|http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-Application-cluster-standalone-job-some-JVM-Options-added-to-Program-Arguments-td40719.html] by Alexey exposing dynamic properties into the job user code. I was able to reproduce this issue by slightly adapting the WordCount example ({{org.apache.flink.streaming.examples.wordcount.WordCount2}} in attached WordCount.jar). Initiating a standalone job without using the {{--input}} parameter will result in printing an empty array: {code} ./bin/standalone-job.sh start --job-classname org.apache.flink.streaming.examples.wordcount.WordCount2 {code} The corresponding {{*.out}} file looks like this: {code} [] Executing WordCount2 example with default input data set. Use --input to specify file input. Printing result to stdout. Use --output to specify output path. {code} In contrast, initiating the standalone job using the {{--input}} parameter will expose the dynamic properties: {code} ./bin/standalone-job.sh start --job-classname org.apache.flink.streaming.examples.wordcount.WordCount2 --input /opt/flink/config/flink-conf.yaml {code} Resulting in the following output: {code} [--input, /opt/flink/config/flink-conf.yaml, -D, jobmanager.memory.off-heap.size=134217728b, -D, jobmanager.memory.jvm-overhead.min=201326592b, -D, jobmanager.memory.jvm-metaspace.size=268435456b, -D, jobmanager.memory.heap.size=1073741824b, -D, jobmanager.memory.jvm-overhead.max=201326592b] Printing result to stdout. Use --output to specify output path. {code} Interestingly, this cannot be reproduced on a local standalone session cluster. > Dynamic properties get exposed to job's main method if user parameters are > passed > - > > Key: FLINK-21024 > URL: https://issues.apache.org/jira/browse/FLINK-21024 > Project: Flink > Issue Type: Bug > Components: Runtime / Configuration >Affects Versions: 1.12.1 >Reporter: Matthias >Priority: Major > Labels: starter > Attachments: WordCount.jar > > > A bug was identified in the [user > ML|http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-Application-cluster-standalone-job-some-JVM-Options-added-to-Program-Arguments-td40719.html] > by Alexey exposing dynamic properties into the job user code. > I was able to reproduce this issue by slightly adapting the WordCount example > ({{org.apache.flink.streaming.examples.wordcount.WordCount2}} in attached > [^WordCount.jar] ). > Initiating a standalone job without using the {{--input}} parameter will > result in printing an empty array: > {code} > ./bin/standalone-job.sh start --job-classname > org.apache.flink.streaming.examples.wordcount.WordCount2 > {code} > The corresponding {{*.out}} file looks like this: > {code} > [] > Executing WordCount2 example with default input data set. > Use --input to specify file input. > Printing result to stdout. Use --output to specify outpu
[GitHub] [flink] YuvalItzchakov commented on pull request #14633: [FLINK-20961][table-planner-blink] Fix NPE when no assigned timestamp defined
YuvalItzchakov commented on pull request #14633: URL: https://github.com/apache/flink/pull/14633#issuecomment-762709238 @flinkbot run azure This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] weizheng-cloud opened a new pull request #14695: [FLINK-21022] [flink-connector-es] add onSuccess handler after bulk process
weizheng-cloud opened a new pull request #14695: URL: https://github.com/apache/flink/pull/14695 …rocess ## What is the purpose of the change *add onSuccess handler after bulk process for user to override to do some other logic, for ex* ## Brief change log *add onSuccess handler after bulk process, default onSuccess handler logic is empty* ## Verifying this change This change added tests and can be verified as follows: - *ElasticsearchSinkBaseTest.testInitWithSuccessHandler()* ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no) - The serializers: (no) - The runtime per-record code paths (performance sensitive): (no) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: (no) - The S3 file system connector: (no) ## Documentation - Does this pull request introduce a new feature? (no) - If yes, how is the feature documented? (not applicable) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-21022) flink-connector-es add onSuccess handler after bulk process for sync success data to other third party system for data consistency checking
[ https://issues.apache.org/jira/browse/FLINK-21022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-21022: --- Labels: pull-request-available (was: ) > flink-connector-es add onSuccess handler after bulk process for sync success > data to other third party system for data consistency checking > --- > > Key: FLINK-21022 > URL: https://issues.apache.org/jira/browse/FLINK-21022 > Project: Flink > Issue Type: Improvement > Components: Connectors / ElasticSearch >Affects Versions: 1.12.0 >Reporter: Zheng WEI >Priority: Major > Labels: pull-request-available > Fix For: 1.13.0, 1.12.2 > > > flink-connector-es add onSuccess handler after successful bulk process, in > order to sync success data to other third party system for data consistency > checking. Default the implementation of onSuccess function is empty logic, > user can set its own onSuccess handler when needed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] flinkbot edited a comment on pull request #14633: [FLINK-20961][table-planner-blink] Fix NPE when no assigned timestamp defined
flinkbot edited a comment on pull request #14633: URL: https://github.com/apache/flink/pull/14633#issuecomment-759451108 ## CI report: * 5ba2774d208ac12b1a0d6e67c426cf06211d4e21 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=11998) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #14637: [FLINK-20949][table-planner-blink] Separate the implementation of sink nodes
flinkbot edited a comment on pull request #14637: URL: https://github.com/apache/flink/pull/14637#issuecomment-759980085 ## CI report: * f2db6f3e6c725c09b24e919ea6cc45537be86d35 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12211) * e02d849c359db923aa91235d9756f055cb44c362 UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #14694: [FLINK-19158][e2e] Fix wget timeout mechanism and cache config
flinkbot edited a comment on pull request #14694: URL: https://github.com/apache/flink/pull/14694#issuecomment-762699714 ## CI report: * 1c8e22e046a15d442e4d0bcdec73c6ea5b4a3f2d Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #14693: [FLINK-19446][canal][json] canal-json has a situation that -U and +U are equal, when updating the null field to be non-null
flinkbot edited a comment on pull request #14693: URL: https://github.com/apache/flink/pull/14693#issuecomment-762669697 ## CI report: * 58b45f883dfd13a36efbf22b17a487fd28d94d90 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12221) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot commented on pull request #14695: [FLINK-21022] [flink-connector-es] add onSuccess handler after bulk process
flinkbot commented on pull request #14695: URL: https://github.com/apache/flink/pull/14695#issuecomment-762711380 Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community to review your pull request. We will use this comment to track the progress of the review. ## Automated Checks Last check on commit 598af596abeef70bb8cb3f456fc956dce5758035 (Tue Jan 19 09:19:15 UTC 2021) **Warnings:** * No documentation files were touched! Remember to keep the Flink docs up to date! * **This pull request references an unassigned [Jira ticket](https://issues.apache.org/jira/browse/FLINK-21022).** According to the [code contribution guide](https://flink.apache.org/contributing/contribute-code.html), tickets need to be assigned before starting with the implementation work. Mention the bot in a comment to re-run the automated checks. ## Review Progress * ❓ 1. The [description] looks good. * ❓ 2. There is [consensus] that the contribution should go into to Flink. * ❓ 3. Needs [attention] from. * ❓ 4. The change fits into the overall [architecture]. * ❓ 5. Overall code [quality] is good. Please see the [Pull Request Review Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands The @flinkbot bot supports the following commands: - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`) - `@flinkbot approve all` to approve all aspects - `@flinkbot approve-until architecture` to approve everything until `architecture` - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention - `@flinkbot disapprove architecture` to remove an approval you gave earlier This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-21022) flink-connector-es add onSuccess handler after bulk process for sync success data to other third party system for data consistency checking
[ https://issues.apache.org/jira/browse/FLINK-21022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng WEI updated FLINK-21022: -- Affects Version/s: (was: 1.12.0) > flink-connector-es add onSuccess handler after bulk process for sync success > data to other third party system for data consistency checking > --- > > Key: FLINK-21022 > URL: https://issues.apache.org/jira/browse/FLINK-21022 > Project: Flink > Issue Type: Improvement > Components: Connectors / ElasticSearch >Reporter: Zheng WEI >Priority: Major > Labels: pull-request-available > Fix For: 1.13.0, 1.12.2 > > > flink-connector-es add onSuccess handler after successful bulk process, in > order to sync success data to other third party system for data consistency > checking. Default the implementation of onSuccess function is empty logic, > user can set its own onSuccess handler when needed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] tillrohrmann commented on a change in pull request #14686: [FLINK-6042][runtime] Adds exception history
tillrohrmann commented on a change in pull request #14686: URL: https://github.com/apache/flink/pull/14686#discussion_r560025485 ## File path: flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/DefaultScheduler.java ## @@ -270,9 +270,22 @@ private void restartTasksWithDelay(final FailureHandlingResult failureHandlingRe delayExecutor.schedule( () -> FutureUtils.assertNoException( -cancelFuture.thenRunAsync( -restartTasks(executionVertexVersions, globalRecovery), -getMainThreadExecutor())), +cancelFuture +.thenRunAsync( +restartTasks( + executionVertexVersions, globalRecovery), +getMainThreadExecutor()) +.thenRunAsync( +() -> +archiveExceptions( + failureHandlingResult.getError(), + executionVertexVersions.stream() +.map( + ExecutionVertexVersion + ::getExecutionVertexId) + .collect( + Collectors + .toList(), Review comment: No, actually I think a better way would be to ask the `Executions` which are reset whether they have failed or not and if they have to record the `Execution.failureCause`. For the timestamp you can check whether taking the state transition timestamp to the `FAILED` state is good enough. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-21022) flink-connector-es add onSuccess handler after bulk process for sync success data to other third party system for data consistency checking
[ https://issues.apache.org/jira/browse/FLINK-21022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng WEI updated FLINK-21022: -- Fix Version/s: (was: 1.12.2) 1.11.4 > flink-connector-es add onSuccess handler after bulk process for sync success > data to other third party system for data consistency checking > --- > > Key: FLINK-21022 > URL: https://issues.apache.org/jira/browse/FLINK-21022 > Project: Flink > Issue Type: Improvement > Components: Connectors / ElasticSearch >Reporter: Zheng WEI >Priority: Major > Labels: pull-request-available > Fix For: 1.13.0, 1.11.4 > > > flink-connector-es add onSuccess handler after successful bulk process, in > order to sync success data to other third party system for data consistency > checking. Default the implementation of onSuccess function is empty logic, > user can set its own onSuccess handler when needed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-20992) Checkpoint cleanup can kill JobMaster
[ https://issues.apache.org/jira/browse/FLINK-20992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17267775#comment-17267775 ] Till Rohrmann commented on FLINK-20992: --- I don't think that FLINK-18290 is impossible to solve differently [~roman_khachatryan]. For example, one could use a {{handle}} call in which one first checks whether the {{CheckepointCoordinator}} is still running before enqueuing a new runnable into the {{timer}} {{Executor}}. > Checkpoint cleanup can kill JobMaster > - > > Key: FLINK-20992 > URL: https://issues.apache.org/jira/browse/FLINK-20992 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing >Affects Versions: 1.12.0 >Reporter: Till Rohrmann >Assignee: Roman Khachatryan >Priority: Critical > Labels: pull-request-available > Fix For: 1.13.0, 1.12.2 > > > A user reported that cancelling a job can lead to an uncaught exception which > kills the {{JobMaster}}. The problem seems to be that the > {{CheckpointsCleaner}} might trigger {{CheckpointCoordinator}} actions after > the job has reached a terminal state and, thus, is shut down. Apparently, we > do not properly manage the lifecycles of {{CheckpointCoordinator}} and > checkpoint post clean up actions. > The uncaught exception is > {code} > java.util.concurrent.RejectedExecutionException: Task > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@41554407 > rejected from > java.util.concurrent.ScheduledThreadPoolExecutor@5d0ec6f7[Terminated, pool > size = 0, active threads = 0, queued tasks = 0, completed tasks = 25977] at > java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063 > at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830 > at > java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:326 > at > java.util.concurrent.ScheduledThreadPoolExecutor.schedule(ScheduledThreadPoolExecutor.java:533 > at > java.util.concurrent.ScheduledThreadPoolExecutor.execute(ScheduledThreadPoolExecutor.java:622 > at > java.util.concurrent.Executors$DelegatedExecutorService.execute(Executors.java:668 > at > org.apache.flink.runtime.concurrent.ScheduledExecutorServiceAdapter.execute(ScheduledExecutorServiceAdapter.java:62 > at > org.apache.flink.runtime.checkpoint.CheckpointCoordinator.scheduleTriggerRequest(CheckpointCoordinator.java:1152 > at > org.apache.flink.runtime.checkpoint.CheckpointsCleaner.lambda$cleanCheckpoint$0(CheckpointsCleaner.java:58 > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149 > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624 > at java.lang.Thread.run(Thread.java:748 undefined) > {code} > cc [~roman_khachatryan]. > https://lists.apache.org/thread.html/r75901008d88163560aabb8ab6fc458cd9d4f19076e03ae85e00f9a3a%40%3Cuser.flink.apache.org%3E -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (FLINK-20944) Launching in application mode requesting a ClusterIP rest service type results in an Exception
[ https://issues.apache.org/jira/browse/FLINK-20944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann reassigned FLINK-20944: - Assignee: Yang Wang > Launching in application mode requesting a ClusterIP rest service type > results in an Exception > -- > > Key: FLINK-20944 > URL: https://issues.apache.org/jira/browse/FLINK-20944 > Project: Flink > Issue Type: Bug > Components: Deployment / Kubernetes >Affects Versions: 1.12.0 > Environment: Kubernetes 1.17 > Flink 1.12 > Running ./bin/flink from an Ubuntu 18.04 host >Reporter: Jason Brome >Assignee: Yang Wang >Priority: Critical > Labels: pull-request-available > Fix For: 1.13.0, 1.12.2 > > > Run a Flink job in Kubernetes in application mode, specifying > kubernetes.rest-service.exposed.type=ClusterIP, results in the job being > started, however the call to ./bin/flink throws an UnknownHostException > Exception on the client. > Command line: > {{./bin/flink run-application --target kubernetes-application > -Dkubernetes.cluster-id=myjob-qa > -Dkubernetes.container.image=_SOME_REDACTED_PATH/somrepo/someimage_ > -Dkubernetes.service-account=flink-service-account > -Dkubernetes.namespace=myjob-qa > -Dkubernetes.rest-service.exposed.type=ClusterIP local:///opt/flink}} > {{/usrlib/my-job.jar}} > Output: > 2021-01-12 20:29:19,047 INFO > org.apache.flink.kubernetes.utils.KubernetesUtils [] - Kubernetes deployment > requires a fixed port. Configuration blob.server.port will be set to 6124 > 2021-01-12 20:29:19,048 INFO > org.apache.flink.kubernetes.utils.KubernetesUtils [] - Kubernetes deployment > requires a fixed port. Configuration taskmanager.rpc.port will be set to 6122 > 2021-01-12 20:29:20,369 ERROR > org.apache.flink.kubernetes.kubeclient.Fabric8FlinkKubeClient [] - A > Kubernetes exception occurred. > java.net.UnknownHostException: myjob-qa-rest.myjob-qa: Name or service not > known > at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method) ~[?:1.8.0_275] > at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929) > ~[?:1.8.0_275] > at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324) > ~[?:1.8.0_275] > at java.net.InetAddress.getAllByName0(InetAddress.java:1277) ~[?:1.8.0_275] > at java.net.InetAddress.getAllByName(InetAddress.java:1193) ~[?:1.8.0_275] > at java.net.InetAddress.getAllByName(InetAddress.java:1127) ~[?:1.8.0_275] > at java.net.InetAddress.getByName(InetAddress.java:1077) ~[?:1.8.0_275] > at > org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.getWebMonitorAddress(HighAvailabilityServicesUtils.java:193) > ~[flink-dist_2.12-1.12.0.jar:1.12.0] > at > org.apache.flink.kubernetes.KubernetesClusterDescriptor.lambda$createClusterClientProvider$0(KubernetesClusterDescriptor.java:114) > ~[flink-dist_2.12-1.12.0.jar:1.12.0] > at > org.apache.flink.kubernetes.KubernetesClusterDescriptor.deployApplicationCluster(KubernetesClusterDescriptor.java:185) > ~[flink-dist_2.12-1.12.0.jar:1.12.0] > at > org.apache.flink.client.deployment.application.cli.ApplicationClusterDeployer.run(ApplicationClusterDeployer.java:64) > ~[flink-dist_2.12-1.12.0.jar:1.12.0] > at > org.apache.flink.client.cli.CliFrontend.runApplication(CliFrontend.java:207) > ~[flink-dist_2.12-1.12.0.jar:1.12.0] > at org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:974) > ~[flink-dist_2.12-1.12.0.jar:1.12.0] > at > org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1047) > ~[flink-dist_2.12-1.12.0.jar:1.12.0] > at > org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30) > [flink-dist_2.12-1.12.0.jar:1.12.0] > at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1047) > [flink-dist_2.12-1.12.0.jar:1.12.0] > > The program finished with the following exception: > java.lang.RuntimeException: > org.apache.flink.client.deployment.ClusterRetrieveException: Could not create > the RestClusterClient. > at > org.apache.flink.kubernetes.KubernetesClusterDescriptor.lambda$createClusterClientProvider$0(KubernetesClusterDescriptor.java:118) > at > org.apache.flink.kubernetes.KubernetesClusterDescriptor.deployApplicationCluster(KubernetesClusterDescriptor.java:185) > at > org.apache.flink.client.deployment.application.cli.ApplicationClusterDeployer.run(ApplicationClusterDeployer.java:64) > at > org.apache.flink.client.cli.CliFrontend.runApplication(CliFrontend.java:207) > at org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:974) > at > org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1047)
[GitHub] [flink] YuvalItzchakov commented on pull request #14633: [FLINK-20961][table-planner-blink] Fix NPE when no assigned timestamp defined
YuvalItzchakov commented on pull request #14633: URL: https://github.com/apache/flink/pull/14633#issuecomment-762718369 @wuchong Can I re-run the CI tests? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-21024) Dynamic properties get exposed to job's main method if user parameters are passed
[ https://issues.apache.org/jira/browse/FLINK-21024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17267780#comment-17267780 ] Xintong Song commented on FLINK-21024: -- I think the problem is that, `standalone-job.sh` failed to make sure all flink options comes before the job arguments. {{ClusterEntrypoint}} reads flink options from the beginning of the JVM process argument list. It stops at the first unrecognized option, and treat the rest as the job arguments. This is not reproduced on a standalone session cluster because job arguments are provided at submitting the job rather than at starting the cluster. I can try to provide a fix. > Dynamic properties get exposed to job's main method if user parameters are > passed > - > > Key: FLINK-21024 > URL: https://issues.apache.org/jira/browse/FLINK-21024 > Project: Flink > Issue Type: Bug > Components: Runtime / Configuration >Affects Versions: 1.12.1 >Reporter: Matthias >Priority: Major > Labels: starter > Attachments: WordCount.jar > > > A bug was identified in the [user > ML|http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-Application-cluster-standalone-job-some-JVM-Options-added-to-Program-Arguments-td40719.html] > by Alexey exposing dynamic properties into the job user code. > I was able to reproduce this issue by slightly adapting the WordCount example > ({{org.apache.flink.streaming.examples.wordcount.WordCount2}} in attached > [^WordCount.jar] ). > Initiating a standalone job without using the {{--input}} parameter will > result in printing an empty array: > {code} > ./bin/standalone-job.sh start --job-classname > org.apache.flink.streaming.examples.wordcount.WordCount2 > {code} > The corresponding {{*.out}} file looks like this: > {code} > [] > Executing WordCount2 example with default input data set. > Use --input to specify file input. > Printing result to stdout. Use --output to specify output path. > {code} > In contrast, initiating the standalone job using the {{--input}} parameter > will expose the dynamic properties: > {code} > ./bin/standalone-job.sh start --job-classname > org.apache.flink.streaming.examples.wordcount.WordCount2 --input > /opt/flink/config/flink-conf.yaml > {code} > Resulting in the following output: > {code} > [--input, /opt/flink/config/flink-conf.yaml, -D, > jobmanager.memory.off-heap.size=134217728b, -D, > jobmanager.memory.jvm-overhead.min=201326592b, -D, > jobmanager.memory.jvm-metaspace.size=268435456b, -D, > jobmanager.memory.heap.size=1073741824b, -D, > jobmanager.memory.jvm-overhead.max=201326592b] > Printing result to stdout. Use --output to specify output path. > {code} > Interestingly, this cannot be reproduced on a local standalone session > cluster. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (FLINK-21024) Dynamic properties get exposed to job's main method if user parameters are passed
[ https://issues.apache.org/jira/browse/FLINK-21024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17267780#comment-17267780 ] Xintong Song edited comment on FLINK-21024 at 1/19/21, 9:37 AM: I think the problem is that, `standalone-job.sh` failed to make sure all flink options come before the job arguments. {{ClusterEntrypoint}} reads flink options from the beginning of the JVM process argument list. It stops at the first unrecognized option, and treat the rest as the job arguments. This is not reproduced on a standalone session cluster because job arguments are provided at submitting the job rather than at starting the cluster. I can try to provide a fix. was (Author: xintongsong): I think the problem is that, `standalone-job.sh` failed to make sure all flink options comes before the job arguments. {{ClusterEntrypoint}} reads flink options from the beginning of the JVM process argument list. It stops at the first unrecognized option, and treat the rest as the job arguments. This is not reproduced on a standalone session cluster because job arguments are provided at submitting the job rather than at starting the cluster. I can try to provide a fix. > Dynamic properties get exposed to job's main method if user parameters are > passed > - > > Key: FLINK-21024 > URL: https://issues.apache.org/jira/browse/FLINK-21024 > Project: Flink > Issue Type: Bug > Components: Runtime / Configuration >Affects Versions: 1.12.1 >Reporter: Matthias >Priority: Major > Labels: starter > Attachments: WordCount.jar > > > A bug was identified in the [user > ML|http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-Application-cluster-standalone-job-some-JVM-Options-added-to-Program-Arguments-td40719.html] > by Alexey exposing dynamic properties into the job user code. > I was able to reproduce this issue by slightly adapting the WordCount example > ({{org.apache.flink.streaming.examples.wordcount.WordCount2}} in attached > [^WordCount.jar] ). > Initiating a standalone job without using the {{--input}} parameter will > result in printing an empty array: > {code} > ./bin/standalone-job.sh start --job-classname > org.apache.flink.streaming.examples.wordcount.WordCount2 > {code} > The corresponding {{*.out}} file looks like this: > {code} > [] > Executing WordCount2 example with default input data set. > Use --input to specify file input. > Printing result to stdout. Use --output to specify output path. > {code} > In contrast, initiating the standalone job using the {{--input}} parameter > will expose the dynamic properties: > {code} > ./bin/standalone-job.sh start --job-classname > org.apache.flink.streaming.examples.wordcount.WordCount2 --input > /opt/flink/config/flink-conf.yaml > {code} > Resulting in the following output: > {code} > [--input, /opt/flink/config/flink-conf.yaml, -D, > jobmanager.memory.off-heap.size=134217728b, -D, > jobmanager.memory.jvm-overhead.min=201326592b, -D, > jobmanager.memory.jvm-metaspace.size=268435456b, -D, > jobmanager.memory.heap.size=1073741824b, -D, > jobmanager.memory.jvm-overhead.max=201326592b] > Printing result to stdout. Use --output to specify output path. > {code} > Interestingly, this cannot be reproduced on a local standalone session > cluster. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (FLINK-21024) Dynamic properties get exposed to job's main method if user parameters are passed
[ https://issues.apache.org/jira/browse/FLINK-21024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17267780#comment-17267780 ] Xintong Song edited comment on FLINK-21024 at 1/19/21, 9:37 AM: I think the problem is that, `standalone-job.sh` failed to make sure all flink options come before the job arguments. {{ClusterEntrypoint}} reads flink options from the beginning of the JVM process argument list. It stops at the first unrecognized option, and treat the rest as the job arguments. This is not reproduced on a standalone session cluster because job arguments are provided at submitting the job rather than at starting the cluster. I can try to provide a quick fix. was (Author: xintongsong): I think the problem is that, `standalone-job.sh` failed to make sure all flink options come before the job arguments. {{ClusterEntrypoint}} reads flink options from the beginning of the JVM process argument list. It stops at the first unrecognized option, and treat the rest as the job arguments. This is not reproduced on a standalone session cluster because job arguments are provided at submitting the job rather than at starting the cluster. I can try to provide a fix. > Dynamic properties get exposed to job's main method if user parameters are > passed > - > > Key: FLINK-21024 > URL: https://issues.apache.org/jira/browse/FLINK-21024 > Project: Flink > Issue Type: Bug > Components: Runtime / Configuration >Affects Versions: 1.12.1 >Reporter: Matthias >Priority: Major > Labels: starter > Attachments: WordCount.jar > > > A bug was identified in the [user > ML|http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-Application-cluster-standalone-job-some-JVM-Options-added-to-Program-Arguments-td40719.html] > by Alexey exposing dynamic properties into the job user code. > I was able to reproduce this issue by slightly adapting the WordCount example > ({{org.apache.flink.streaming.examples.wordcount.WordCount2}} in attached > [^WordCount.jar] ). > Initiating a standalone job without using the {{--input}} parameter will > result in printing an empty array: > {code} > ./bin/standalone-job.sh start --job-classname > org.apache.flink.streaming.examples.wordcount.WordCount2 > {code} > The corresponding {{*.out}} file looks like this: > {code} > [] > Executing WordCount2 example with default input data set. > Use --input to specify file input. > Printing result to stdout. Use --output to specify output path. > {code} > In contrast, initiating the standalone job using the {{--input}} parameter > will expose the dynamic properties: > {code} > ./bin/standalone-job.sh start --job-classname > org.apache.flink.streaming.examples.wordcount.WordCount2 --input > /opt/flink/config/flink-conf.yaml > {code} > Resulting in the following output: > {code} > [--input, /opt/flink/config/flink-conf.yaml, -D, > jobmanager.memory.off-heap.size=134217728b, -D, > jobmanager.memory.jvm-overhead.min=201326592b, -D, > jobmanager.memory.jvm-metaspace.size=268435456b, -D, > jobmanager.memory.heap.size=1073741824b, -D, > jobmanager.memory.jvm-overhead.max=201326592b] > Printing result to stdout. Use --output to specify output path. > {code} > Interestingly, this cannot be reproduced on a local standalone session > cluster. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (FLINK-21024) Dynamic properties get exposed to job's main method if user parameters are passed
[ https://issues.apache.org/jira/browse/FLINK-21024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song reassigned FLINK-21024: Assignee: Xintong Song > Dynamic properties get exposed to job's main method if user parameters are > passed > - > > Key: FLINK-21024 > URL: https://issues.apache.org/jira/browse/FLINK-21024 > Project: Flink > Issue Type: Bug > Components: Runtime / Configuration >Affects Versions: 1.12.1 >Reporter: Matthias >Assignee: Xintong Song >Priority: Major > Labels: starter > Attachments: WordCount.jar > > > A bug was identified in the [user > ML|http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-Application-cluster-standalone-job-some-JVM-Options-added-to-Program-Arguments-td40719.html] > by Alexey exposing dynamic properties into the job user code. > I was able to reproduce this issue by slightly adapting the WordCount example > ({{org.apache.flink.streaming.examples.wordcount.WordCount2}} in attached > [^WordCount.jar] ). > Initiating a standalone job without using the {{--input}} parameter will > result in printing an empty array: > {code} > ./bin/standalone-job.sh start --job-classname > org.apache.flink.streaming.examples.wordcount.WordCount2 > {code} > The corresponding {{*.out}} file looks like this: > {code} > [] > Executing WordCount2 example with default input data set. > Use --input to specify file input. > Printing result to stdout. Use --output to specify output path. > {code} > In contrast, initiating the standalone job using the {{--input}} parameter > will expose the dynamic properties: > {code} > ./bin/standalone-job.sh start --job-classname > org.apache.flink.streaming.examples.wordcount.WordCount2 --input > /opt/flink/config/flink-conf.yaml > {code} > Resulting in the following output: > {code} > [--input, /opt/flink/config/flink-conf.yaml, -D, > jobmanager.memory.off-heap.size=134217728b, -D, > jobmanager.memory.jvm-overhead.min=201326592b, -D, > jobmanager.memory.jvm-metaspace.size=268435456b, -D, > jobmanager.memory.heap.size=1073741824b, -D, > jobmanager.memory.jvm-overhead.max=201326592b] > Printing result to stdout. Use --output to specify output path. > {code} > Interestingly, this cannot be reproduced on a local standalone session > cluster. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] flinkbot edited a comment on pull request #14633: [FLINK-20961][table-planner-blink] Fix NPE when no assigned timestamp defined
flinkbot edited a comment on pull request #14633: URL: https://github.com/apache/flink/pull/14633#issuecomment-759451108 ## CI report: * 5ba2774d208ac12b1a0d6e67c426cf06211d4e21 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=11998) * 4740cf1d2d0a16700698f5f2108ee4a310d942dd UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #14637: [FLINK-20949][table-planner-blink] Separate the implementation of sink nodes
flinkbot edited a comment on pull request #14637: URL: https://github.com/apache/flink/pull/14637#issuecomment-759980085 ## CI report: * f2db6f3e6c725c09b24e919ea6cc45537be86d35 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12211) * e02d849c359db923aa91235d9756f055cb44c362 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12223) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] tillrohrmann commented on a change in pull request #14692: [FLINK-20944][k8s] Do not resolve the rest endpoint address when the service exposed type is ClusterIP
tillrohrmann commented on a change in pull request #14692: URL: https://github.com/apache/flink/pull/14692#discussion_r560039104 ## File path: flink-kubernetes/src/main/java/org/apache/flink/kubernetes/KubernetesClusterDescriptor.java ## @@ -125,6 +121,16 @@ public String getClusterDescription() { }; } +private String getWebMonitorAddress(Configuration configuration) throws Exception { +HighAvailabilityServicesUtils.AddressResolution resolution = + HighAvailabilityServicesUtils.AddressResolution.TRY_ADDRESS_RESOLUTION; +if (configuration.get(KubernetesConfigOptions.REST_SERVICE_EXPOSED_TYPE) +== KubernetesConfigOptions.ServiceExposedType.ClusterIP) { +resolution = HighAvailabilityServicesUtils.AddressResolution.NO_ADDRESS_RESOLUTION; +} +return HighAvailabilityServicesUtils.getWebMonitorAddress(configuration, resolution); +} Review comment: Instead of suppressing the web monitor address resolution, can't we tell the `RestClusterClient` the address to the namespaced service? I think you mentioned in the ticket that we will communicate with the cluster through this service if we have chose `ClusterIP`. It just feels wrong that we still retrieve the web monitor's address even though we don't need it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot commented on pull request #14695: [FLINK-21022] [flink-connector-es] add onSuccess handler after bulk process
flinkbot commented on pull request #14695: URL: https://github.com/apache/flink/pull/14695#issuecomment-762724470 ## CI report: * 598af596abeef70bb8cb3f456fc956dce5758035 UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-21008) ClusterEntrypoint#shutDownAsync may not be fully executed
[ https://issues.apache.org/jira/browse/FLINK-21008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-21008: -- Affects Version/s: 1.11.3 1.12.1 > ClusterEntrypoint#shutDownAsync may not be fully executed > - > > Key: FLINK-21008 > URL: https://issues.apache.org/jira/browse/FLINK-21008 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.11.3, 1.12.1 >Reporter: Yang Wang >Priority: Major > > Recently, in our internal use case for native K8s integration with K8s HA > enabled, we found that the leader related ConfigMaps could be residual in > some corner situations. > After some investigations, I think it is possibly caused by the inappropriate > shutdown process. > In {{ClusterEntrypoint#shutDownAsync}}, we first call the > {{closeClusterComponent}}, which also includes deregistering the Flink > application from cluster management(e.g. Yarn, K8s). Then we call the > {{stopClusterServices}} and {{cleanupDirectories}}. Imagine that the cluster > management do the deregister very fast, the JobManager process receives > SIGNAL 15 before or is being executing the {{stopClusterServices}} and > {{cleanupDirectories}}. The jvm process will directly exit then. So the two > methods may not be executed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-21008) ClusterEntrypoint#shutDownAsync may not be fully executed
[ https://issues.apache.org/jira/browse/FLINK-21008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-21008: -- Fix Version/s: 1.13.0 > ClusterEntrypoint#shutDownAsync may not be fully executed > - > > Key: FLINK-21008 > URL: https://issues.apache.org/jira/browse/FLINK-21008 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.11.3, 1.12.1 >Reporter: Yang Wang >Priority: Major > Fix For: 1.13.0 > > > Recently, in our internal use case for native K8s integration with K8s HA > enabled, we found that the leader related ConfigMaps could be residual in > some corner situations. > After some investigations, I think it is possibly caused by the inappropriate > shutdown process. > In {{ClusterEntrypoint#shutDownAsync}}, we first call the > {{closeClusterComponent}}, which also includes deregistering the Flink > application from cluster management(e.g. Yarn, K8s). Then we call the > {{stopClusterServices}} and {{cleanupDirectories}}. Imagine that the cluster > management do the deregister very fast, the JobManager process receives > SIGNAL 15 before or is being executing the {{stopClusterServices}} and > {{cleanupDirectories}}. The jvm process will directly exit then. So the two > methods may not be executed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-21011) Separate the implementation of StreamExecIntervalJoin
[ https://issues.apache.org/jira/browse/FLINK-21011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenlong Lyu updated FLINK-21011: Summary: Separate the implementation of StreamExecIntervalJoin (was: Separete the implementation of StreamExecJoin) > Separate the implementation of StreamExecIntervalJoin > - > > Key: FLINK-21011 > URL: https://issues.apache.org/jira/browse/FLINK-21011 > Project: Flink > Issue Type: Sub-task > Components: Table SQL / Planner >Reporter: Wenlong Lyu >Priority: Major > Fix For: 1.13.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Reopened] (FLINK-21011) Separate the implementation of StreamExecIntervalJoin
[ https://issues.apache.org/jira/browse/FLINK-21011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenlong Lyu reopened FLINK-21011: - > Separate the implementation of StreamExecIntervalJoin > - > > Key: FLINK-21011 > URL: https://issues.apache.org/jira/browse/FLINK-21011 > Project: Flink > Issue Type: Sub-task > Components: Table SQL / Planner >Reporter: Wenlong Lyu >Priority: Major > Fix For: 1.13.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-21008) ClusterEntrypoint#shutDownAsync may not be fully executed
[ https://issues.apache.org/jira/browse/FLINK-21008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-21008: -- Priority: Critical (was: Major) > ClusterEntrypoint#shutDownAsync may not be fully executed > - > > Key: FLINK-21008 > URL: https://issues.apache.org/jira/browse/FLINK-21008 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.11.3, 1.12.1 >Reporter: Yang Wang >Priority: Critical > Fix For: 1.13.0 > > > Recently, in our internal use case for native K8s integration with K8s HA > enabled, we found that the leader related ConfigMaps could be residual in > some corner situations. > After some investigations, I think it is possibly caused by the inappropriate > shutdown process. > In {{ClusterEntrypoint#shutDownAsync}}, we first call the > {{closeClusterComponent}}, which also includes deregistering the Flink > application from cluster management(e.g. Yarn, K8s). Then we call the > {{stopClusterServices}} and {{cleanupDirectories}}. Imagine that the cluster > management do the deregister very fast, the JobManager process receives > SIGNAL 15 before or is being executing the {{stopClusterServices}} and > {{cleanupDirectories}}. The jvm process will directly exit then. So the two > methods may not be executed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-21008) ClusterEntrypoint#shutDownAsync may not be fully executed
[ https://issues.apache.org/jira/browse/FLINK-21008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17267793#comment-17267793 ] Till Rohrmann commented on FLINK-21008: --- Is the problem that deregistering the application from K8s will trigger K8s to send a SIGTERM to the JobManager process? I guess then this behaves a bit differently from Yarn and needs to be changed. Is there a way to let the process properly terminate but still deleting the K8s resource (e.g. deployment)? Maybe we need to register a shutdown hook which waits on the {{ClusterEntrypoint}} to have completed its shutdown. > ClusterEntrypoint#shutDownAsync may not be fully executed > - > > Key: FLINK-21008 > URL: https://issues.apache.org/jira/browse/FLINK-21008 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.11.3, 1.12.1 >Reporter: Yang Wang >Priority: Major > Fix For: 1.13.0 > > > Recently, in our internal use case for native K8s integration with K8s HA > enabled, we found that the leader related ConfigMaps could be residual in > some corner situations. > After some investigations, I think it is possibly caused by the inappropriate > shutdown process. > In {{ClusterEntrypoint#shutDownAsync}}, we first call the > {{closeClusterComponent}}, which also includes deregistering the Flink > application from cluster management(e.g. Yarn, K8s). Then we call the > {{stopClusterServices}} and {{cleanupDirectories}}. Imagine that the cluster > management do the deregister very fast, the JobManager process receives > SIGNAL 15 before or is being executing the {{stopClusterServices}} and > {{cleanupDirectories}}. The jvm process will directly exit then. So the two > methods may not be executed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] weizheng-cloud opened a new pull request #14696: [FLINK-21022] [flink-connector-es] add onSuccess handler after bulk process
weizheng-cloud opened a new pull request #14696: URL: https://github.com/apache/flink/pull/14696 ## What is the purpose of the change *add onSuccess handler after bulk process for user to override to do some other logic, for ex* ## Brief change log *add onSuccess handler after bulk process, default onSuccess handler logic is empty* ## Verifying this change This change added tests and can be verified as follows: - *ElasticsearchSinkBaseTest.testInitWithSuccessHandler()* ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no) - The serializers: (no) - The runtime per-record code paths (performance sensitive): (no) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: (no) - The S3 file system connector: (no) ## Documentation - Does this pull request introduce a new feature? (no) - If yes, how is the feature documented? (not applicable) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #14633: [FLINK-20961][table-planner-blink] Fix NPE when no assigned timestamp defined
flinkbot edited a comment on pull request #14633: URL: https://github.com/apache/flink/pull/14633#issuecomment-759451108 ## CI report: * 5ba2774d208ac12b1a0d6e67c426cf06211d4e21 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=11998) * 4740cf1d2d0a16700698f5f2108ee4a310d942dd UNKNOWN * 65bd73bc66714cfdd687947f0b1c464ed976cd64 UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #14647: [FLINK-20835] Implement FineGrainedSlotManager
flinkbot edited a comment on pull request #14647: URL: https://github.com/apache/flink/pull/14647#issuecomment-760154591 ## CI report: * 1c605546a2ac429e1c475eea6f80a75d0326c7bf Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12054) * 559cd8d09cca8082e9ca4904d11b11443f4c959f UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #14685: [FLINK-21010] Separate the implementation of StreamExecJoin
flinkbot edited a comment on pull request #14685: URL: https://github.com/apache/flink/pull/14685#issuecomment-762256472 ## CI report: * 0a5fff76d69da75391ceb25e470173bae504e1c9 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12207) * 7c5fb7f4fd6cc436ae9e1da50a70878f2c70b12b UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #14693: [FLINK-19446][canal][json] canal-json has a situation that -U and +U are equal, when updating the null field to be non-null
flinkbot edited a comment on pull request #14693: URL: https://github.com/apache/flink/pull/14693#issuecomment-762669697 ## CI report: * 58b45f883dfd13a36efbf22b17a487fd28d94d90 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12221) * 93d1e170043ced37964e3ace610762bc231d0ba6 UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #14695: [FLINK-21022] [flink-connector-es] add onSuccess handler after bulk process
flinkbot edited a comment on pull request #14695: URL: https://github.com/apache/flink/pull/14695#issuecomment-762724470 ## CI report: * 598af596abeef70bb8cb3f456fc956dce5758035 UNKNOWN * 2e99153d8828c74be95b3b452f664eaf558125fe UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot commented on pull request #14696: [FLINK-21022] [flink-connector-es] add onSuccess handler after bulk process
flinkbot commented on pull request #14696: URL: https://github.com/apache/flink/pull/14696#issuecomment-762737540 Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community to review your pull request. We will use this comment to track the progress of the review. ## Automated Checks Last check on commit 8bf9de1dc18bce85e63697b063fd75c992e61fbe (Tue Jan 19 10:03:54 UTC 2021) **Warnings:** * No documentation files were touched! Remember to keep the Flink docs up to date! * **This pull request references an unassigned [Jira ticket](https://issues.apache.org/jira/browse/FLINK-21022).** According to the [code contribution guide](https://flink.apache.org/contributing/contribute-code.html), tickets need to be assigned before starting with the implementation work. Mention the bot in a comment to re-run the automated checks. ## Review Progress * ❓ 1. The [description] looks good. * ❓ 2. There is [consensus] that the contribution should go into to Flink. * ❓ 3. Needs [attention] from. * ❓ 4. The change fits into the overall [architecture]. * ❓ 5. Overall code [quality] is good. Please see the [Pull Request Review Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands The @flinkbot bot supports the following commands: - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`) - `@flinkbot approve all` to approve all aspects - `@flinkbot approve-until architecture` to approve everything until `architecture` - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention - `@flinkbot disapprove architecture` to remove an approval you gave earlier This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] tillrohrmann commented on a change in pull request #14683: [FLINK-20992][checkpointing] Tolerate checkpoint cleanup failures
tillrohrmann commented on a change in pull request #14683: URL: https://github.com/apache/flink/pull/14683#discussion_r560057569 ## File path: flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointsCleaner.java ## @@ -62,8 +62,15 @@ public void cleanCheckpoint( } } } finally { -numberOfCheckpointsToClean.decrementAndGet(); -postCleanAction.run(); +try { +numberOfCheckpointsToClean.decrementAndGet(); +postCleanAction.run(); +} catch (Exception e) { +LOG.error( +"Error while cleaning up checkpoint {}", +checkpoint.getCheckpointID(), +e); Review comment: Hmm, I guess this is related to the threading model of the `CheckpointCoordinator`. Could you elaborate on the deadlock you are fearing? Which other locks would be hold by the `postCleanAction` which could cause a deadlock? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-20811) Support HTTP paths for yarn ship files/archives
[ https://issues.apache.org/jira/browse/FLINK-20811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17267799#comment-17267799 ] Venkata S Muppalla commented on FLINK-20811: Thanks [~xintongsong] and [~ZhenqiuHuang] I will work on this ticket and will make sure I follow the guidelines. > Support HTTP paths for yarn ship files/archives > --- > > Key: FLINK-20811 > URL: https://issues.apache.org/jira/browse/FLINK-20811 > Project: Flink > Issue Type: New Feature > Components: Deployment / YARN >Reporter: Xintong Song >Assignee: Venkata S Muppalla >Priority: Major > > Flink's Yarn integration supports shipping workload-specific local > files/directories/archives to the Yarn cluster. > As discussed in FLINK-20505, it would be helpful to support directly > downloading contents from HTTP paths to the Yarn cluster, so that users won't > need to first download the contents locally and then upload it to the Yarn > cluster. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] tillrohrmann commented on a change in pull request #14683: [FLINK-20992][checkpointing] Tolerate checkpoint cleanup failures
tillrohrmann commented on a change in pull request #14683: URL: https://github.com/apache/flink/pull/14683#discussion_r560061568 ## File path: flink-runtime/src/test/java/org/apache/flink/runtime/checkpoint/CheckpointCleanerTest.java ## @@ -0,0 +1,86 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.runtime.checkpoint; + +import org.apache.flink.api.common.JobID; +import org.apache.flink.runtime.concurrent.Executors; +import org.apache.flink.runtime.state.testutils.TestCompletedCheckpointStorageLocation; +import org.junit.Test; + +import java.util.Collections; +import java.util.concurrent.ExecutorService; + +import static java.util.concurrent.TimeUnit.SECONDS; +import static org.apache.flink.runtime.checkpoint.CheckpointRetentionPolicy.NEVER_RETAIN_AFTER_TERMINATION; +import static org.apache.flink.util.Preconditions.checkState; +import static org.junit.Assert.assertTrue; + +public class CheckpointCleanerTest { + +@Test +public void testTolerateFailureInPostCleanupSubmit() throws InterruptedException { Review comment: Ok, I think the problem is not the spec-based test framework but that I don't see where the failure from the submission of the post-cleanup callback is coming from. When I remove your changes of this PR, then the test also passes. Hence, this indicates to me that we are not testing the change of this PR here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] tillrohrmann commented on a change in pull request #14683: [FLINK-20992][checkpointing] Tolerate checkpoint cleanup failures
tillrohrmann commented on a change in pull request #14683: URL: https://github.com/apache/flink/pull/14683#discussion_r560057569 ## File path: flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointsCleaner.java ## @@ -62,8 +62,15 @@ public void cleanCheckpoint( } } } finally { -numberOfCheckpointsToClean.decrementAndGet(); -postCleanAction.run(); +try { +numberOfCheckpointsToClean.decrementAndGet(); +postCleanAction.run(); +} catch (Exception e) { +LOG.error( +"Error while cleaning up checkpoint {}", +checkpoint.getCheckpointID(), +e); Review comment: Hmm, I guess this is related to the threading model of the `CheckpointCoordinator`. Could you elaborate on the deadlock you are fearing? Which other locks would be held by the `postCleanAction` which could cause a deadlock? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] tillrohrmann closed pull request #14666: [FLINK-17085][kubernetes] Remove FlinkKubeClient.handleException
tillrohrmann closed pull request #14666: URL: https://github.com/apache/flink/pull/14666 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Closed] (FLINK-17085) Remove FlinkKubeClient#handleException since it does not do anything
[ https://issues.apache.org/jira/browse/FLINK-17085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann closed FLINK-17085. - Fix Version/s: 1.13.0 Resolution: Fixed Fixed via 674edcf2f46761c8949f72aa2be13b1b63031158 > Remove FlinkKubeClient#handleException since it does not do anything > > > Key: FLINK-17085 > URL: https://issues.apache.org/jira/browse/FLINK-17085 > Project: Flink > Issue Type: Task > Components: Deployment / Kubernetes >Reporter: Yang Wang >Assignee: Matthias >Priority: Minor > Labels: pull-request-available > Fix For: 1.13.0 > > > Follow up the discussion in this PR[1]. > Currently, {{FlinkKubeClient#handleException}} just log the exception and do > not have any substantial operations. Since all the exception has already been > handled, for example, > {{KubernetesClusterDescriptor#createClusterClientProvider #killCluster}}, > then it should be removed. > Moreover, the {{handleException}} will cause the exception duplication in > CLI. > > [1]. [https://github.com/apache/flink/pull/11427#discussion_r404221581] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-21008) ClusterEntrypoint#shutDownAsync may not be fully executed
[ https://issues.apache.org/jira/browse/FLINK-21008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17267804#comment-17267804 ] Yang Wang commented on FLINK-21008: --- You are right. Deregistering the application from K8s(aka delete the JobManager deployment) will let the kubelet send a SIGTERM to JobManager process. But the Yarn has the same behavior. The reason why we do not come into this issue when deploying Flink application on Yarn is that the SIGTERM is sent a little late. Because Yarn ResourceManager tell NodeManager to kill(SIGTERM and followed by a SIGKILL) the JobManager via heartbeat, which is 3 seconds by default. However, on Kubernetes, kubelet is informed via watcher, which is no delay. Assume that the cluster entrypoint costs more that 3 second for the internal clean up( {{stopClusterServices}} and {{cleanupDirectories}}), we will run into the same situation on Yarn deployment. > ClusterEntrypoint#shutDownAsync may not be fully executed > - > > Key: FLINK-21008 > URL: https://issues.apache.org/jira/browse/FLINK-21008 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.11.3, 1.12.1 >Reporter: Yang Wang >Priority: Critical > Fix For: 1.13.0 > > > Recently, in our internal use case for native K8s integration with K8s HA > enabled, we found that the leader related ConfigMaps could be residual in > some corner situations. > After some investigations, I think it is possibly caused by the inappropriate > shutdown process. > In {{ClusterEntrypoint#shutDownAsync}}, we first call the > {{closeClusterComponent}}, which also includes deregistering the Flink > application from cluster management(e.g. Yarn, K8s). Then we call the > {{stopClusterServices}} and {{cleanupDirectories}}. Imagine that the cluster > management do the deregister very fast, the JobManager process receives > SIGNAL 15 before or is being executing the {{stopClusterServices}} and > {{cleanupDirectories}}. The jvm process will directly exit then. So the two > methods may not be executed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] zentol commented on a change in pull request #14694: [FLINK-19158][e2e] Fix wget timeout mechanism and cache config
zentol commented on a change in pull request #14694: URL: https://github.com/apache/flink/pull/14694#discussion_r560067499 ## File path: flink-end-to-end-tests/flink-end-to-end-tests-common/src/main/java/org/apache/flink/tests/util/cache/AbstractDownloadCache.java ## @@ -128,7 +128,11 @@ public Path getOrDownload(final String url, final Path targetDir) throws IOExcep Files.createDirectories(scopedDownloadDir); log.info("Downloading {}.", url); AutoClosableProcess.create( - CommandLineWrapper.wget(url).targetDir(scopedDownloadDir).build()) +CommandLineWrapper.wget(url) +.targetDir(scopedDownloadDir) +.timeoutSecs( +downloadGlobalTimeout.getSeconds() / downloadMaxRetries) Review comment: shouldn't this use the `downloadAttemptTimeout`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] zentol commented on a change in pull request #14694: [FLINK-19158][e2e] Fix wget timeout mechanism and cache config
zentol commented on a change in pull request #14694: URL: https://github.com/apache/flink/pull/14694#discussion_r560067730 ## File path: tools/azure-pipelines/jobs-template.yml ## @@ -199,7 +199,7 @@ jobs: condition: not(eq(variables['SKIP'], '1')) - task: Cache@2 inputs: -key: e2e-cache | flink-end-to-end-tests/**/*.java, !**/avro/** Review comment: what is this for? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] zentol commented on a change in pull request #14694: [FLINK-19158][e2e] Fix wget timeout mechanism and cache config
zentol commented on a change in pull request #14694: URL: https://github.com/apache/flink/pull/14694#discussion_r560068264 ## File path: flink-end-to-end-tests/flink-end-to-end-tests-common/src/main/java/org/apache/flink/tests/util/CommandLineWrapper.java ## @@ -45,6 +46,11 @@ public WGetBuilder targetDir(Path dir) { return this; } +public WGetBuilder timeoutSecs(long secs) { Review comment: This should accept a Duration and internally convert it to whatever unit wget expects This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] zentol merged pull request #14510: [FLINK-19656] [metrics] filter delimiter in metrics components
zentol merged pull request #14510: URL: https://github.com/apache/flink/pull/14510 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Closed] (FLINK-19656) Automatically replace delimiter in metric name components
[ https://issues.apache.org/jira/browse/FLINK-19656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chesnay Schepler closed FLINK-19656. Resolution: Fixed master: 559ed01976df3bfcb450e03114220b2b71eee90c 56dbc24367979fdf9bd3f83ba115db1c5680effb > Automatically replace delimiter in metric name components > - > > Key: FLINK-19656 > URL: https://issues.apache.org/jira/browse/FLINK-19656 > Project: Flink > Issue Type: Improvement > Components: Runtime / Metrics >Reporter: Chesnay Schepler >Assignee: Etienne Chauchot >Priority: Major > Labels: pull-request-available, starter > Fix For: 1.13.0 > > > The metric name consists of various components (like job ID, task ID), that > are then joined by a delimiter(commonly {{.}}). > The delimiter isn't just for convention, but also carries semantics for many > metric backends, as they organize metrics based on the delimiter. > This can behave in unfortunate ways if the delimiter is contained with a > given component, as it will now be split up by the backend. > We should automatically filter such occurrences to prevent this from > happening. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] zentol commented on a change in pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager
zentol commented on a change in pull request #14678: URL: https://github.com/apache/flink/pull/14678#discussion_r560072956 ## File path: flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/FailureListener.java ## @@ -0,0 +1,40 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.runtime.executiongraph; + +import org.apache.flink.core.plugin.Plugin; +import org.apache.flink.runtime.metrics.groups.JobManagerJobMetricGroup; + +/** Failure listener to customize the behavior for each type of failures tracked in job manager. */ +public interface FailureListener extends Plugin { + +/** + * Initialize the listener with JobManagerJobMetricGroup. + * + * @param metricGroup metrics group that the listener can add customized metrics definition. + */ +void init(JobManagerJobMetricGroup metricGroup); + +/** + * Method to handle a failure in the listener. + * + * @param cause the failure cause + * @param globalFailure whether the failure is a global failure + */ +void onFailure(final Throwable cause, boolean globalFailure); Review comment: The purpose of MetricGroups is not to make metadata readily accessible to other components. Why not just pass the jobID/jobName/whatever metadata you linked separately into the constructor/init? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] wangyang0918 commented on a change in pull request #14692: [FLINK-20944][k8s] Do not resolve the rest endpoint address when the service exposed type is ClusterIP
wangyang0918 commented on a change in pull request #14692: URL: https://github.com/apache/flink/pull/14692#discussion_r560073139 ## File path: flink-kubernetes/src/main/java/org/apache/flink/kubernetes/KubernetesClusterDescriptor.java ## @@ -125,6 +121,16 @@ public String getClusterDescription() { }; } +private String getWebMonitorAddress(Configuration configuration) throws Exception { +HighAvailabilityServicesUtils.AddressResolution resolution = + HighAvailabilityServicesUtils.AddressResolution.TRY_ADDRESS_RESOLUTION; +if (configuration.get(KubernetesConfigOptions.REST_SERVICE_EXPOSED_TYPE) +== KubernetesConfigOptions.ServiceExposedType.ClusterIP) { +resolution = HighAvailabilityServicesUtils.AddressResolution.NO_ADDRESS_RESOLUTION; +} +return HighAvailabilityServicesUtils.getWebMonitorAddress(configuration, resolution); +} Review comment: Flink client communicates with the cluster via namespaced service if ClusterIP is chosen. I assume you mean directly return a `RestClusterClient` using the namespaced service(aka `restEndpoint.get().getAddress()`). After then, we also need to check whether the ssl is enabled and add `http/https` protocol. I think it is what we have done in `HighAvailabilityServicesUtils.getWebMonitorAddress`. Moreover, I do not think we are retrieving the web monitor's address. It is more like to construct the address in a specific schema(aka protocol://address:port). The retrieval process has already been done in the `flinkKubeClient.getRestEndpoint`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] zentol commented on a change in pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager
zentol commented on a change in pull request #14678: URL: https://github.com/apache/flink/pull/14678#discussion_r560072956 ## File path: flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/FailureListener.java ## @@ -0,0 +1,40 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.runtime.executiongraph; + +import org.apache.flink.core.plugin.Plugin; +import org.apache.flink.runtime.metrics.groups.JobManagerJobMetricGroup; + +/** Failure listener to customize the behavior for each type of failures tracked in job manager. */ +public interface FailureListener extends Plugin { + +/** + * Initialize the listener with JobManagerJobMetricGroup. + * + * @param metricGroup metrics group that the listener can add customized metrics definition. + */ +void init(JobManagerJobMetricGroup metricGroup); + +/** + * Method to handle a failure in the listener. + * + * @param cause the failure cause + * @param globalFailure whether the failure is a global failure + */ +void onFailure(final Throwable cause, boolean globalFailure); Review comment: The purpose of MetricGroups is not to make metadata readily accessible to other components. Why not just pass the jobID/jobName/whatever metadata you like separately into the constructor/init? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Comment Edited] (FLINK-21008) ClusterEntrypoint#shutDownAsync may not be fully executed
[ https://issues.apache.org/jira/browse/FLINK-21008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17267804#comment-17267804 ] Yang Wang edited comment on FLINK-21008 at 1/19/21, 10:31 AM: -- You are right. Deregistering the application from K8s(aka delete the JobManager deployment) will let the kubelet send a SIGTERM to JobManager process. But the Yarn has the same behavior. The reason why we do not come into this issue when deploying Flink application on Yarn is that the SIGTERM is sent a little late. Because Yarn ResourceManager tells NodeManager to kill(SIGTERM and followed by a SIGKILL) the JobManager via heartbeat, which is 3 seconds by default. However, on Kubernetes, kubelet is informed via watcher, which is no delay. Assume that the cluster entrypoint costs more than 3 seconds for the internal clean up( {{stopClusterServices}} and {{cleanupDirectories}}), we will run into the same situation on Yarn deployment. was (Author: fly_in_gis): You are right. Deregistering the application from K8s(aka delete the JobManager deployment) will let the kubelet send a SIGTERM to JobManager process. But the Yarn has the same behavior. The reason why we do not come into this issue when deploying Flink application on Yarn is that the SIGTERM is sent a little late. Because Yarn ResourceManager tell NodeManager to kill(SIGTERM and followed by a SIGKILL) the JobManager via heartbeat, which is 3 seconds by default. However, on Kubernetes, kubelet is informed via watcher, which is no delay. Assume that the cluster entrypoint costs more that 3 second for the internal clean up( {{stopClusterServices}} and {{cleanupDirectories}}), we will run into the same situation on Yarn deployment. > ClusterEntrypoint#shutDownAsync may not be fully executed > - > > Key: FLINK-21008 > URL: https://issues.apache.org/jira/browse/FLINK-21008 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.11.3, 1.12.1 >Reporter: Yang Wang >Priority: Critical > Fix For: 1.13.0 > > > Recently, in our internal use case for native K8s integration with K8s HA > enabled, we found that the leader related ConfigMaps could be residual in > some corner situations. > After some investigations, I think it is possibly caused by the inappropriate > shutdown process. > In {{ClusterEntrypoint#shutDownAsync}}, we first call the > {{closeClusterComponent}}, which also includes deregistering the Flink > application from cluster management(e.g. Yarn, K8s). Then we call the > {{stopClusterServices}} and {{cleanupDirectories}}. Imagine that the cluster > management do the deregister very fast, the JobManager process receives > SIGNAL 15 before or is being executing the {{stopClusterServices}} and > {{cleanupDirectories}}. The jvm process will directly exit then. So the two > methods may not be executed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] flinkbot edited a comment on pull request #14633: [FLINK-20961][table-planner-blink] Fix NPE when no assigned timestamp defined
flinkbot edited a comment on pull request #14633: URL: https://github.com/apache/flink/pull/14633#issuecomment-759451108 ## CI report: * 4740cf1d2d0a16700698f5f2108ee4a310d942dd UNKNOWN * 65bd73bc66714cfdd687947f0b1c464ed976cd64 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12224) * 00801e0afc7a16f4dbe36493b6df148b70af1de2 UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #14637: [FLINK-20949][table-planner-blink] Separate the implementation of sink nodes
flinkbot edited a comment on pull request #14637: URL: https://github.com/apache/flink/pull/14637#issuecomment-759980085 ## CI report: * e02d849c359db923aa91235d9756f055cb44c362 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12223) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #14647: [FLINK-20835] Implement FineGrainedSlotManager
flinkbot edited a comment on pull request #14647: URL: https://github.com/apache/flink/pull/14647#issuecomment-760154591 ## CI report: * 1c605546a2ac429e1c475eea6f80a75d0326c7bf Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12054) * 559cd8d09cca8082e9ca4904d11b11443f4c959f Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12226) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #14685: [FLINK-21010] Separate the implementation of StreamExecJoin
flinkbot edited a comment on pull request #14685: URL: https://github.com/apache/flink/pull/14685#issuecomment-762256472 ## CI report: * 0a5fff76d69da75391ceb25e470173bae504e1c9 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12207) * 7c5fb7f4fd6cc436ae9e1da50a70878f2c70b12b Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12227) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #14693: [FLINK-19446][canal][json] canal-json has a situation that -U and +U are equal, when updating the null field to be non-null
flinkbot edited a comment on pull request #14693: URL: https://github.com/apache/flink/pull/14693#issuecomment-762669697 ## CI report: * 58b45f883dfd13a36efbf22b17a487fd28d94d90 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12221) * 93d1e170043ced37964e3ace610762bc231d0ba6 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12228) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #14689: [FLINK-20500][upsert-kafka] Fix temporal join test
flinkbot edited a comment on pull request #14689: URL: https://github.com/apache/flink/pull/14689#issuecomment-762595621 ## CI report: * 7421043232e94c67d4e44020b8a5a319a3e95a09 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12213) * 3e8786051223f482699028c5d1683aa66e44e1aa UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot commented on pull request #14696: [FLINK-21022] [flink-connector-es] add onSuccess handler after bulk process
flinkbot commented on pull request #14696: URL: https://github.com/apache/flink/pull/14696#issuecomment-762754413 ## CI report: * 8bf9de1dc18bce85e63697b063fd75c992e61fbe UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-21024) Dynamic properties get exposed to job's main method if user parameters are passed
[ https://issues.apache.org/jira/browse/FLINK-21024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17267817#comment-17267817 ] Matthias commented on FLINK-21024: -- Interesting finding. Thanks for looking into that, [~xintongsong]. [bin/mesos-jobmanager.sh|https://github.com/apache/flink/blob/0a51d85255b9c7480eb5e939d88e9ccc5e98af69/flink-dist/src/main/flink-bin/mesos-bin/mesos-jobmanager.sh#L32] needs to be adapted accordingly. > Dynamic properties get exposed to job's main method if user parameters are > passed > - > > Key: FLINK-21024 > URL: https://issues.apache.org/jira/browse/FLINK-21024 > Project: Flink > Issue Type: Bug > Components: Runtime / Configuration >Affects Versions: 1.12.1 >Reporter: Matthias >Assignee: Xintong Song >Priority: Major > Labels: starter > Attachments: WordCount.jar > > > A bug was identified in the [user > ML|http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-Application-cluster-standalone-job-some-JVM-Options-added-to-Program-Arguments-td40719.html] > by Alexey exposing dynamic properties into the job user code. > I was able to reproduce this issue by slightly adapting the WordCount example > ({{org.apache.flink.streaming.examples.wordcount.WordCount2}} in attached > [^WordCount.jar] ). > Initiating a standalone job without using the {{--input}} parameter will > result in printing an empty array: > {code} > ./bin/standalone-job.sh start --job-classname > org.apache.flink.streaming.examples.wordcount.WordCount2 > {code} > The corresponding {{*.out}} file looks like this: > {code} > [] > Executing WordCount2 example with default input data set. > Use --input to specify file input. > Printing result to stdout. Use --output to specify output path. > {code} > In contrast, initiating the standalone job using the {{--input}} parameter > will expose the dynamic properties: > {code} > ./bin/standalone-job.sh start --job-classname > org.apache.flink.streaming.examples.wordcount.WordCount2 --input > /opt/flink/config/flink-conf.yaml > {code} > Resulting in the following output: > {code} > [--input, /opt/flink/config/flink-conf.yaml, -D, > jobmanager.memory.off-heap.size=134217728b, -D, > jobmanager.memory.jvm-overhead.min=201326592b, -D, > jobmanager.memory.jvm-metaspace.size=268435456b, -D, > jobmanager.memory.heap.size=1073741824b, -D, > jobmanager.memory.jvm-overhead.max=201326592b] > Printing result to stdout. Use --output to specify output path. > {code} > Interestingly, this cannot be reproduced on a local standalone session > cluster. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-21027) Add isKeyValueImmutable() method to KeyedStateBackend interface
Jark Wu created FLINK-21027: --- Summary: Add isKeyValueImmutable() method to KeyedStateBackend interface Key: FLINK-21027 URL: https://issues.apache.org/jira/browse/FLINK-21027 Project: Flink Issue Type: Improvement Components: Runtime / State Backends Reporter: Jark Wu In Table/SQL operators, we have some optimizations that reuse objects of keys and records. For example, we buffer input records in {{BytesMultiMap}} and use the reused object to map to the underlying memory segment to reduce bytes copy. However, if we put the reused key and value into Heap statebackend, the result will be wrong, because it is not allowed to mutate keys and values in Heap statebackend. Therefore, it would be great if {{KeyedStateBackend}} can expose such API, so that Table/SQL can dynamically decide whether to copy the keys and values before putting into state. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-21008) ClusterEntrypoint#shutDownAsync may not be fully executed
[ https://issues.apache.org/jira/browse/FLINK-21008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17267821#comment-17267821 ] Till Rohrmann commented on FLINK-21008: --- I see, then an alternative solution would be to signal the external system to shut down after the whole Flink clean up has been done. The problem here is that the communication logic with the external client is encapsulated in the {{ResourceManager}} which at this point is already shut down. > ClusterEntrypoint#shutDownAsync may not be fully executed > - > > Key: FLINK-21008 > URL: https://issues.apache.org/jira/browse/FLINK-21008 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.11.3, 1.12.1 >Reporter: Yang Wang >Priority: Critical > Fix For: 1.13.0 > > > Recently, in our internal use case for native K8s integration with K8s HA > enabled, we found that the leader related ConfigMaps could be residual in > some corner situations. > After some investigations, I think it is possibly caused by the inappropriate > shutdown process. > In {{ClusterEntrypoint#shutDownAsync}}, we first call the > {{closeClusterComponent}}, which also includes deregistering the Flink > application from cluster management(e.g. Yarn, K8s). Then we call the > {{stopClusterServices}} and {{cleanupDirectories}}. Imagine that the cluster > management do the deregister very fast, the JobManager process receives > SIGNAL 15 before or is being executing the {{stopClusterServices}} and > {{cleanupDirectories}}. The jvm process will directly exit then. So the two > methods may not be executed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] tillrohrmann commented on a change in pull request #14692: [FLINK-20944][k8s] Do not resolve the rest endpoint address when the service exposed type is ClusterIP
tillrohrmann commented on a change in pull request #14692: URL: https://github.com/apache/flink/pull/14692#discussion_r560082524 ## File path: flink-kubernetes/src/main/java/org/apache/flink/kubernetes/KubernetesClusterDescriptor.java ## @@ -125,6 +121,16 @@ public String getClusterDescription() { }; } +private String getWebMonitorAddress(Configuration configuration) throws Exception { +HighAvailabilityServicesUtils.AddressResolution resolution = + HighAvailabilityServicesUtils.AddressResolution.TRY_ADDRESS_RESOLUTION; +if (configuration.get(KubernetesConfigOptions.REST_SERVICE_EXPOSED_TYPE) +== KubernetesConfigOptions.ServiceExposedType.ClusterIP) { +resolution = HighAvailabilityServicesUtils.AddressResolution.NO_ADDRESS_RESOLUTION; +} +return HighAvailabilityServicesUtils.getWebMonitorAddress(configuration, resolution); +} Review comment: In the K8s case, from where exactly do we retrieve the address of the service? If I understood you correctly, then `RestOptions.ADDRESS` contains some address which is not resolvable from the outside. Hence, I am wondering why we should try to construct the web monitor address from this configuration at all. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (FLINK-21028) Streaming application didn't stop properly
Theo Diefenthal created FLINK-21028: --- Summary: Streaming application didn't stop properly Key: FLINK-21028 URL: https://issues.apache.org/jira/browse/FLINK-21028 Project: Flink Issue Type: Bug Affects Versions: 1.11.2 Reporter: Theo Diefenthal I have a Flink job running on YARN with a disjoint graph, i.e. a single job contains two independent and isolated pipelines. >From time to time, I stop the job with a savepoint like so: {code:java} flink stop -p ${SAVEPOINT_BASEDIR}/${FLINK_JOB_NAME}/SAVEPOINTS --yarnapplicationId=${FLINK_YARN_APPID} ${ID}{code} A few days ago, this job suddenly didn't stop properly as usual but ran into a possible race condition. On the CLI with stop, I received a simple timeout: {code:java} org.apache.flink.util.FlinkException: Could not stop with a savepoint job "f23290bf5fb0ecd49a4455e4a65f2eb6". at org.apache.flink.client.cli.CliFrontend.lambda$stop$5(CliFrontend.java:495) at org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:864) at org.apache.flink.client.cli.CliFrontend.stop(CliFrontend.java:487) at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:931) at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:992) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875) at org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:992) Caused by: java.util.concurrent.TimeoutException at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771) at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915) at org.apache.flink.client.cli.CliFrontend.lambda$stop$5(CliFrontend.java:493) ... 9 more{code} The root of the problem however is that on a taskmanager, I received an exception in shutdown, which lead to restarting (a part) of the pipeline and put it back to running state, thus the console command for stopping timed out (as the job was (partially) back in running state). the exception which looks like a race condition for me in the logs is: {code:java} 2021-01-12T06:15:15.827877+01:00 WARN org.apache.flink.runtime.taskmanager.Task Source: rawdata_source1 -> validation_source1 -> enrich_source1 -> map_json_source1 -> Sink: write_to_kafka_source1) (3/18) (bc68320cf69dd877782417a3298499d6) switched from RUNNING to FAILED. java.util.concurrent.ExecutionException: org.apache.flink.streaming.runtime.tasks.ExceptionInChainedOperatorException: Could not forward element to next operator at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915) at org.apache.flink.streaming.runtime.tasks.StreamOperatorWrapper.quiesceTimeServiceAndCloseOperator(StreamOperatorWrapper.java:161) at org.apache.flink.streaming.runtime.tasks.StreamOperatorWrapper.close(StreamOperatorWrapper.java:130) at org.apache.flink.streaming.runtime.tasks.StreamOperatorWrapper.close(StreamOperatorWrapper.java:134) at org.apache.flink.streaming.runtime.tasks.StreamOperatorWrapper.close(StreamOperatorWrapper.java:134) at org.apache.flink.streaming.runtime.tasks.StreamOperatorWrapper.close(StreamOperatorWrapper.java:134) at org.apache.flink.streaming.runtime.tasks.StreamOperatorWrapper.close(StreamOperatorWrapper.java:134) at org.apache.flink.streaming.runtime.tasks.StreamOperatorWrapper.close(StreamOperatorWrapper.java:134) at org.apache.flink.streaming.runtime.tasks.StreamOperatorWrapper.close(StreamOperatorWrapper.java:80) at org.apache.flink.streaming.runtime.tasks.OperatorChain.closeOperators(OperatorChain.java:302) at org.apache.flink.streaming.runtime.tasks.StreamTask.afterInvoke(StreamTask.java:576) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:544) at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:721) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:546) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.flink.streaming.runtime.tasks.ExceptionInChainedOperatorException: Could not forward element to next operator at org.apache.flink.streaming.runtime.tasks.OperatorChain$ChainingOutput.emitWatermark(OperatorChain.java:642) at org.apache.flink.streaming.api.operators.CountingOutput.emitWatermark(CountingOutput.java:41) at org.apache.flink.streaming.runtime.operators.TimestampsAndWatermarksOperator$WatermarkEmitter.emitWatermark(TimestampsAndWatermarksOperator.java:165) at org.apache.flink.streaming.runtime.operators.util.AssignerWithPeriodicWatermarksAdapter.onPeriodicEmit(AssignerWithPeriodicWatermarksAdapter
[GitHub] [flink] wangyang0918 commented on a change in pull request #14692: [FLINK-20944][k8s] Do not resolve the rest endpoint address when the service exposed type is ClusterIP
wangyang0918 commented on a change in pull request #14692: URL: https://github.com/apache/flink/pull/14692#discussion_r560084849 ## File path: flink-kubernetes/src/main/java/org/apache/flink/kubernetes/KubernetesClusterDescriptor.java ## @@ -125,6 +121,16 @@ public String getClusterDescription() { }; } +private String getWebMonitorAddress(Configuration configuration) throws Exception { +HighAvailabilityServicesUtils.AddressResolution resolution = + HighAvailabilityServicesUtils.AddressResolution.TRY_ADDRESS_RESOLUTION; +if (configuration.get(KubernetesConfigOptions.REST_SERVICE_EXPOSED_TYPE) +== KubernetesConfigOptions.ServiceExposedType.ClusterIP) { +resolution = HighAvailabilityServicesUtils.AddressResolution.NO_ADDRESS_RESOLUTION; +} +return HighAvailabilityServicesUtils.getWebMonitorAddress(configuration, resolution); +} Review comment: After a little more consideration, maybe we always do not need to resolve the rest endpoint address. When the service exposed type is NodePort/LoadBalancer, the rest endpoint address is usually a ip address, not a hostname. For ClusterIP, we also do not need to resolve the address. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] dawidwys commented on a change in pull request #14660: [FLINK-20971][table-api-java] Support calling SQL expressions in Table API
dawidwys commented on a change in pull request #14660: URL: https://github.com/apache/flink/pull/14660#discussion_r560069421 ## File path: docs/dev/table/functions/systemFunctions.md ## @@ -5636,6 +5638,21 @@ STRING.sha2(INT) {% highlight java %} +callSql(STRING) +{% endhighlight %} + + +A call to a SQL expression. +The given string is parsed and translated into an Table API expression during planning. Only Review comment: ```suggestion The given string is parsed and translated into a Table API expression during planning. Only ``` ## File path: docs/dev/table/functions/systemFunctions.md ## @@ -34,6 +34,8 @@ Scalar Functions The scalar functions take zero, one or more values as the input and return a single value as the result. +Note: You can Review comment: Something's missing here ;) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org