[GitHub] flink issue #2265: [FLINK-3097] [table] Add support for custom functions in ...
Github user twalthr commented on the issue: https://github.com/apache/flink/pull/2265 No, there is no FLIP about it. I think a discussion in JIRA or in this PR should be enough. That's why I haven't documented it yet. I was inspired by your [document](https://docs.google.com/document/d/1KMUzvBAWSyQ39T8MyxUi0zNHyvLUnyGMPA7_RLSDpFw/edit). You are right, `ScalarFunction` has many internal functions but they are not exposed to the user, only 2 methods can be overriden. An interface is not enough as it might be sometimes necessary to override `getReturnType` and `getParameterType`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3097) Add support for custom functions in Table API
[ https://issues.apache.org/jira/browse/FLINK-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15383717#comment-15383717 ] ASF GitHub Bot commented on FLINK-3097: --- Github user twalthr commented on the issue: https://github.com/apache/flink/pull/2265 No, there is no FLIP about it. I think a discussion in JIRA or in this PR should be enough. That's why I haven't documented it yet. I was inspired by your [document](https://docs.google.com/document/d/1KMUzvBAWSyQ39T8MyxUi0zNHyvLUnyGMPA7_RLSDpFw/edit). You are right, `ScalarFunction` has many internal functions but they are not exposed to the user, only 2 methods can be overriden. An interface is not enough as it might be sometimes necessary to override `getReturnType` and `getParameterType`. > Add support for custom functions in Table API > - > > Key: FLINK-3097 > URL: https://issues.apache.org/jira/browse/FLINK-3097 > Project: Flink > Issue Type: New Feature > Components: Table API & SQL >Reporter: Timo Walther >Assignee: Timo Walther > > Currently, the Table API has a very limited set of built-in functions. > Support for custom functions can solve this problem. Adding of a custom row > function could look like: > {code} > TableEnvironment tableEnv = new TableEnvironment(); > RowFunction rf = new RowFunction() { > @Override > public String call(Object[] args) { > return ((String) args[0]).trim(); > } > }; > tableEnv.getConfig().registerRowFunction("TRIM", rf, > BasicTypeInfo.STRING_TYPE_INFO); > DataSource> input = env.fromElements( > new Tuple1<>(" 1 ")); > Table table = tableEnv.fromDataSet(input); > Table result = table.select("TRIM(f0)"); > {code} > This feature is also necessary as part of FLINK-2099. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4183) Move checking for StreamTableEnvironment into validation layer
[ https://issues.apache.org/jira/browse/FLINK-4183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15383738#comment-15383738 ] ASF GitHub Bot commented on FLINK-4183: --- Github user twalthr commented on the issue: https://github.com/apache/flink/pull/2221 I will merge this later today if there are no objections... > Move checking for StreamTableEnvironment into validation layer > -- > > Key: FLINK-4183 > URL: https://issues.apache.org/jira/browse/FLINK-4183 > Project: Flink > Issue Type: Improvement > Components: Table API & SQL >Reporter: Timo Walther >Assignee: Timo Walther >Priority: Minor > > Some operators check the environment in `table.scala` instead of doing this > during the valdation phase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink issue #2221: [FLINK-4183] [table] Move checking for StreamTableEnviron...
Github user twalthr commented on the issue: https://github.com/apache/flink/pull/2221 I will merge this later today if there are no objections... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (FLINK-4232) Flink executable does not return correct pid
David Moravek created FLINK-4232: Summary: Flink executable does not return correct pid Key: FLINK-4232 URL: https://issues.apache.org/jira/browse/FLINK-4232 Project: Flink Issue Type: Bug Affects Versions: 1.0.3 Reporter: David Moravek Priority: Minor Eg. when using supervisor, pid returned by ./bin/flink is pid of shell executable instead of java process -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request #2268: [FLINK-4232] Make sure ./bin/flink returns correct...
GitHub user dmvk opened a pull request: https://github.com/apache/flink/pull/2268 [FLINK-4232] Make sure ./bin/flink returns correct pid You can merge this pull request into a Git repository by running: $ git pull https://github.com/dmvk/flink FLINK-4232 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/2268.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2268 commit 17fa3e1e9a3350c843abceed9611218fe0bf287a Author: David Moravek Date: 2016-07-19T08:15:06Z [FLINK-4232] Make sure ./bin/flink returns correct pid --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-4232) Flink executable does not return correct pid
[ https://issues.apache.org/jira/browse/FLINK-4232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15383770#comment-15383770 ] ASF GitHub Bot commented on FLINK-4232: --- GitHub user dmvk opened a pull request: https://github.com/apache/flink/pull/2268 [FLINK-4232] Make sure ./bin/flink returns correct pid You can merge this pull request into a Git repository by running: $ git pull https://github.com/dmvk/flink FLINK-4232 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/2268.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2268 commit 17fa3e1e9a3350c843abceed9611218fe0bf287a Author: David Moravek Date: 2016-07-19T08:15:06Z [FLINK-4232] Make sure ./bin/flink returns correct pid > Flink executable does not return correct pid > > > Key: FLINK-4232 > URL: https://issues.apache.org/jira/browse/FLINK-4232 > Project: Flink > Issue Type: Bug >Affects Versions: 1.0.3 >Reporter: David Moravek >Priority: Minor > > Eg. when using supervisor, pid returned by ./bin/flink is pid of shell > executable instead of java process -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink issue #1989: FLINK-3901 - Added CsvRowInputFormat
Github user fpompermaier commented on the issue: https://github.com/apache/flink/pull/1989 Hi @twalthr, any news about this? Are you going to merge this PR in the upcoming 1.1 release? If you were waiting for a feedback you have my +1 about your strategy..go ahead and convert the RowCsvInputFormat class to Scala and keep the test code in Java! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3901) Create a RowCsvInputFormat to use as default CSV IF in Table API
[ https://issues.apache.org/jira/browse/FLINK-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15383819#comment-15383819 ] ASF GitHub Bot commented on FLINK-3901: --- Github user fpompermaier commented on the issue: https://github.com/apache/flink/pull/1989 Hi @twalthr, any news about this? Are you going to merge this PR in the upcoming 1.1 release? If you were waiting for a feedback you have my +1 about your strategy..go ahead and convert the RowCsvInputFormat class to Scala and keep the test code in Java! > Create a RowCsvInputFormat to use as default CSV IF in Table API > > > Key: FLINK-3901 > URL: https://issues.apache.org/jira/browse/FLINK-3901 > Project: Flink > Issue Type: Improvement >Affects Versions: 1.0.2 >Reporter: Flavio Pompermaier >Assignee: Flavio Pompermaier >Priority: Minor > Labels: csv, null-values, row, tuple > > At the moment the Table APIs reads CSVs using the TupleCsvInputFormat, that > has the big limitation of 25 fields and null handling. > A new IF producing Row object is indeed necessary to avoid those limitations -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4223) Rearrange scaladoc and javadoc for Scala API
[ https://issues.apache.org/jira/browse/FLINK-4223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15383846#comment-15383846 ] Maximilian Michels commented on FLINK-4223: --- Seems like a duplicate of FLINK-3710. > Rearrange scaladoc and javadoc for Scala API > > > Key: FLINK-4223 > URL: https://issues.apache.org/jira/browse/FLINK-4223 > Project: Flink > Issue Type: Improvement > Components: Documentation >Reporter: Chiwan Park >Priority: Minor > Labels: easyfix, newbie > > Currently, some scaladocs for Scala API (Gelly Scala API, FlinkML, Streaming > Scala API) are not in scaladoc but in javadoc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-4223) Rearrange scaladoc and javadoc for Scala API
[ https://issues.apache.org/jira/browse/FLINK-4223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maximilian Michels closed FLINK-4223. - Resolution: Duplicate > Rearrange scaladoc and javadoc for Scala API > > > Key: FLINK-4223 > URL: https://issues.apache.org/jira/browse/FLINK-4223 > Project: Flink > Issue Type: Improvement > Components: Documentation >Reporter: Chiwan Park >Priority: Minor > Labels: easyfix, newbie > > Currently, some scaladocs for Scala API (Gelly Scala API, FlinkML, Streaming > Scala API) are not in scaladoc but in javadoc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink issue #2265: [FLINK-3097] [table] Add support for custom functions in ...
Github user wuchong commented on the issue: https://github.com/apache/flink/pull/2265 Yes, you are right. I'm just a little concerned about the class name of `ScalarFunction`, haha.. In addition, Java Table API should be `table.select("hashCode(text)");` which is better I think. Assume that the eval function takes two or more parameters, `"udf(a,b)"` will be satisfied and be consistent with Scala Table API and SQL on syntax. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3097) Add support for custom functions in Table API
[ https://issues.apache.org/jira/browse/FLINK-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15383858#comment-15383858 ] ASF GitHub Bot commented on FLINK-3097: --- Github user wuchong commented on the issue: https://github.com/apache/flink/pull/2265 Yes, you are right. I'm just a little concerned about the class name of `ScalarFunction`, haha.. In addition, Java Table API should be `table.select("hashCode(text)");` which is better I think. Assume that the eval function takes two or more parameters, `"udf(a,b)"` will be satisfied and be consistent with Scala Table API and SQL on syntax. > Add support for custom functions in Table API > - > > Key: FLINK-3097 > URL: https://issues.apache.org/jira/browse/FLINK-3097 > Project: Flink > Issue Type: New Feature > Components: Table API & SQL >Reporter: Timo Walther >Assignee: Timo Walther > > Currently, the Table API has a very limited set of built-in functions. > Support for custom functions can solve this problem. Adding of a custom row > function could look like: > {code} > TableEnvironment tableEnv = new TableEnvironment(); > RowFunction rf = new RowFunction() { > @Override > public String call(Object[] args) { > return ((String) args[0]).trim(); > } > }; > tableEnv.getConfig().registerRowFunction("TRIM", rf, > BasicTypeInfo.STRING_TYPE_INFO); > DataSource> input = env.fromElements( > new Tuple1<>(" 1 ")); > Table table = tableEnv.fromDataSet(input); > Table result = table.select("TRIM(f0)"); > {code} > This feature is also necessary as part of FLINK-2099. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink issue #2109: [FLINK-3677] FileInputFormat: Allow to specify include/ex...
Github user mxm commented on the issue: https://github.com/apache/flink/pull/2109 I'll try to review this as soon as possible. Maybe @kl0u or @zentol also want to have another look. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3677) FileInputFormat: Allow to specify include/exclude file name patterns
[ https://issues.apache.org/jira/browse/FLINK-3677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15383863#comment-15383863 ] ASF GitHub Bot commented on FLINK-3677: --- Github user mxm commented on the issue: https://github.com/apache/flink/pull/2109 I'll try to review this as soon as possible. Maybe @kl0u or @zentol also want to have another look. > FileInputFormat: Allow to specify include/exclude file name patterns > > > Key: FLINK-3677 > URL: https://issues.apache.org/jira/browse/FLINK-3677 > Project: Flink > Issue Type: Improvement > Components: Core >Affects Versions: 1.0.0 >Reporter: Maximilian Michels >Assignee: Ivan Mushketyk >Priority: Minor > Labels: starter > > It would be nice to be able to specify a regular expression to filter files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink issue #2265: [FLINK-3097] [table] Add support for custom functions in ...
Github user aljoscha commented on the issue: https://github.com/apache/flink/pull/2265 Yes, @wuchong's suggestion for the Java Table API seems more extensible. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3097) Add support for custom functions in Table API
[ https://issues.apache.org/jira/browse/FLINK-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15383869#comment-15383869 ] ASF GitHub Bot commented on FLINK-3097: --- Github user aljoscha commented on the issue: https://github.com/apache/flink/pull/2265 Yes, @wuchong's suggestion for the Java Table API seems more extensible. > Add support for custom functions in Table API > - > > Key: FLINK-3097 > URL: https://issues.apache.org/jira/browse/FLINK-3097 > Project: Flink > Issue Type: New Feature > Components: Table API & SQL >Reporter: Timo Walther >Assignee: Timo Walther > > Currently, the Table API has a very limited set of built-in functions. > Support for custom functions can solve this problem. Adding of a custom row > function could look like: > {code} > TableEnvironment tableEnv = new TableEnvironment(); > RowFunction rf = new RowFunction() { > @Override > public String call(Object[] args) { > return ((String) args[0]).trim(); > } > }; > tableEnv.getConfig().registerRowFunction("TRIM", rf, > BasicTypeInfo.STRING_TYPE_INFO); > DataSource> input = env.fromElements( > new Tuple1<>(" 1 ")); > Table table = tableEnv.fromDataSet(input); > Table result = table.select("TRIM(f0)"); > {code} > This feature is also necessary as part of FLINK-2099. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink issue #2249: [FLINK-4166] [CLI] Generate different namespaces for Zook...
Github user mxm commented on the issue: https://github.com/apache/flink/pull/2249 Thanks for the update. Could you rebase to the latest master? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-4166) Generate automatic different namespaces in Zookeeper for Flink applications
[ https://issues.apache.org/jira/browse/FLINK-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15383888#comment-15383888 ] ASF GitHub Bot commented on FLINK-4166: --- Github user mxm commented on the issue: https://github.com/apache/flink/pull/2249 Thanks for the update. Could you rebase to the latest master? > Generate automatic different namespaces in Zookeeper for Flink applications > --- > > Key: FLINK-4166 > URL: https://issues.apache.org/jira/browse/FLINK-4166 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination >Affects Versions: 1.0.3 >Reporter: Stefan Richter > > We should automatically generate different namespaces per Flink application > in Zookeeper to avoid interference between different applications that refer > to the same Zookeeper entries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request #2249: [FLINK-4166] [CLI] Generate different namespaces f...
Github user StefanRRichter commented on a diff in the pull request: https://github.com/apache/flink/pull/2249#discussion_r71309594 --- Diff: docs/setup/config.md --- @@ -272,7 +272,9 @@ For example when running Flink on YARN on an environment with a restrictive fire - `recovery.zookeeper.quorum`: Defines the ZooKeeper quorum URL which is used to connet to the ZooKeeper cluster when the 'zookeeper' recovery mode is selected -- `recovery.zookeeper.path.root`: (Default '/flink') Defines the root dir under which the ZooKeeper recovery mode will create znodes. +- `recovery.zookeeper.path.root`: (Default '/flink') Defines the root dir under which the ZooKeeper recovery mode will create namespace directories. + +- `recovery.zookeeper.path.namespace`: (Default '/default_ns' in standalone mode, or the under Yarn) Defines the subdirectory under the root dir where the ZooKeeper recovery mode will create znodes. This allows to isolate multiple applications on the same ZooKeeper. --- End diff -- I think the main problem with default namespaces in standalone mode is that there is no authority generating reliably unique identifier that we can use to label clusters (in contrast to e.g. application ids in Yarn). Also the contents of the masters file could be the same for two different clusters running on the same machines and collisions could happen. In particular, such collisions could be very rare and hence even more surprising to the user. Furthermore, I think that having multiple clusters in this way is exactly when you want to start Yarn or Mesos. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-4166) Generate automatic different namespaces in Zookeeper for Flink applications
[ https://issues.apache.org/jira/browse/FLINK-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15383892#comment-15383892 ] ASF GitHub Bot commented on FLINK-4166: --- Github user StefanRRichter commented on a diff in the pull request: https://github.com/apache/flink/pull/2249#discussion_r71309594 --- Diff: docs/setup/config.md --- @@ -272,7 +272,9 @@ For example when running Flink on YARN on an environment with a restrictive fire - `recovery.zookeeper.quorum`: Defines the ZooKeeper quorum URL which is used to connet to the ZooKeeper cluster when the 'zookeeper' recovery mode is selected -- `recovery.zookeeper.path.root`: (Default '/flink') Defines the root dir under which the ZooKeeper recovery mode will create znodes. +- `recovery.zookeeper.path.root`: (Default '/flink') Defines the root dir under which the ZooKeeper recovery mode will create namespace directories. + +- `recovery.zookeeper.path.namespace`: (Default '/default_ns' in standalone mode, or the under Yarn) Defines the subdirectory under the root dir where the ZooKeeper recovery mode will create znodes. This allows to isolate multiple applications on the same ZooKeeper. --- End diff -- I think the main problem with default namespaces in standalone mode is that there is no authority generating reliably unique identifier that we can use to label clusters (in contrast to e.g. application ids in Yarn). Also the contents of the masters file could be the same for two different clusters running on the same machines and collisions could happen. In particular, such collisions could be very rare and hence even more surprising to the user. Furthermore, I think that having multiple clusters in this way is exactly when you want to start Yarn or Mesos. > Generate automatic different namespaces in Zookeeper for Flink applications > --- > > Key: FLINK-4166 > URL: https://issues.apache.org/jira/browse/FLINK-4166 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination >Affects Versions: 1.0.3 >Reporter: Stefan Richter > > We should automatically generate different namespaces per Flink application > in Zookeeper to avoid interference between different applications that refer > to the same Zookeeper entries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request #2249: [FLINK-4166] [CLI] Generate different namespaces f...
Github user mxm commented on a diff in the pull request: https://github.com/apache/flink/pull/2249#discussion_r71311352 --- Diff: docs/setup/config.md --- @@ -272,7 +272,9 @@ For example when running Flink on YARN on an environment with a restrictive fire - `recovery.zookeeper.quorum`: Defines the ZooKeeper quorum URL which is used to connet to the ZooKeeper cluster when the 'zookeeper' recovery mode is selected -- `recovery.zookeeper.path.root`: (Default '/flink') Defines the root dir under which the ZooKeeper recovery mode will create znodes. +- `recovery.zookeeper.path.root`: (Default '/flink') Defines the root dir under which the ZooKeeper recovery mode will create namespace directories. + +- `recovery.zookeeper.path.namespace`: (Default '/default_ns' in standalone mode, or the under Yarn) Defines the subdirectory under the root dir where the ZooKeeper recovery mode will create znodes. This allows to isolate multiple applications on the same ZooKeeper. --- End diff -- You're right, it would just be a heuristic for generating a unique namespace but it is not unique because multiple Flink instances might be running on the same host names (e.g. multiple processes, Docker containers). So +1 for not adding something that can cause hard to debug problems. We have covered the Yarn use case which makes it very convenient to setup a namespace. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-4166) Generate automatic different namespaces in Zookeeper for Flink applications
[ https://issues.apache.org/jira/browse/FLINK-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15383903#comment-15383903 ] ASF GitHub Bot commented on FLINK-4166: --- Github user mxm commented on a diff in the pull request: https://github.com/apache/flink/pull/2249#discussion_r71311352 --- Diff: docs/setup/config.md --- @@ -272,7 +272,9 @@ For example when running Flink on YARN on an environment with a restrictive fire - `recovery.zookeeper.quorum`: Defines the ZooKeeper quorum URL which is used to connet to the ZooKeeper cluster when the 'zookeeper' recovery mode is selected -- `recovery.zookeeper.path.root`: (Default '/flink') Defines the root dir under which the ZooKeeper recovery mode will create znodes. +- `recovery.zookeeper.path.root`: (Default '/flink') Defines the root dir under which the ZooKeeper recovery mode will create namespace directories. + +- `recovery.zookeeper.path.namespace`: (Default '/default_ns' in standalone mode, or the under Yarn) Defines the subdirectory under the root dir where the ZooKeeper recovery mode will create znodes. This allows to isolate multiple applications on the same ZooKeeper. --- End diff -- You're right, it would just be a heuristic for generating a unique namespace but it is not unique because multiple Flink instances might be running on the same host names (e.g. multiple processes, Docker containers). So +1 for not adding something that can cause hard to debug problems. We have covered the Yarn use case which makes it very convenient to setup a namespace. > Generate automatic different namespaces in Zookeeper for Flink applications > --- > > Key: FLINK-4166 > URL: https://issues.apache.org/jira/browse/FLINK-4166 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination >Affects Versions: 1.0.3 >Reporter: Stefan Richter > > We should automatically generate different namespaces per Flink application > in Zookeeper to avoid interference between different applications that refer > to the same Zookeeper entries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request #2268: [FLINK-4232] Make sure ./bin/flink returns correct...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/2268 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Resolved] (FLINK-4232) Flink executable does not return correct pid
[ https://issues.apache.org/jira/browse/FLINK-4232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aljoscha Krettek resolved FLINK-4232. - Resolution: Fixed Fix Version/s: 1.1.0 Fixed in https://github.com/apache/flink/commit/de8406aab60ddd7ed251965fefb03290d511db13 > Flink executable does not return correct pid > > > Key: FLINK-4232 > URL: https://issues.apache.org/jira/browse/FLINK-4232 > Project: Flink > Issue Type: Bug >Affects Versions: 1.0.3 >Reporter: David Moravek >Priority: Minor > Fix For: 1.1.0 > > > Eg. when using supervisor, pid returned by ./bin/flink is pid of shell > executable instead of java process -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4232) Flink executable does not return correct pid
[ https://issues.apache.org/jira/browse/FLINK-4232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15383908#comment-15383908 ] ASF GitHub Bot commented on FLINK-4232: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/2268 > Flink executable does not return correct pid > > > Key: FLINK-4232 > URL: https://issues.apache.org/jira/browse/FLINK-4232 > Project: Flink > Issue Type: Bug >Affects Versions: 1.0.3 >Reporter: David Moravek >Priority: Minor > Fix For: 1.1.0 > > > Eg. when using supervisor, pid returned by ./bin/flink is pid of shell > executable instead of java process -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink issue #2249: [FLINK-4166] [CLI] Generate different namespaces for Zook...
Github user StefanRRichter commented on the issue: https://github.com/apache/flink/pull/2249 Thanks for the review, Max! I changed the default namespace string and rebased the the latest master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-4166) Generate automatic different namespaces in Zookeeper for Flink applications
[ https://issues.apache.org/jira/browse/FLINK-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384002#comment-15384002 ] ASF GitHub Bot commented on FLINK-4166: --- Github user StefanRRichter commented on the issue: https://github.com/apache/flink/pull/2249 Thanks for the review, Max! I changed the default namespace string and rebased the the latest master. > Generate automatic different namespaces in Zookeeper for Flink applications > --- > > Key: FLINK-4166 > URL: https://issues.apache.org/jira/browse/FLINK-4166 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination >Affects Versions: 1.0.3 >Reporter: Stefan Richter > > We should automatically generate different namespaces per Flink application > in Zookeeper to avoid interference between different applications that refer > to the same Zookeeper entries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request #2257: [FLINK-4152] Allow re-registration of TMs at resou...
Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/2257#discussion_r71322624 --- Diff: flink-yarn/src/main/java/org/apache/flink/yarn/YarnFlinkResourceManager.java --- @@ -78,6 +79,9 @@ /** The containers where a TaskManager is starting and we are waiting for it to register */ private final Map containersInLaunch; + /** The container where a TaskManager has been started and is running in */ + private final Map containersLaunched; --- End diff -- It is true that it holds the registered resources but it does not hold the launched containers. When a `JobManager` loses its leadership the list of registered workers will be cleared. In order to reconstruct the mapping `ResourceID --> Container`, you need this new map. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-4152) TaskManager registration exponential backoff doesn't work
[ https://issues.apache.org/jira/browse/FLINK-4152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384005#comment-15384005 ] ASF GitHub Bot commented on FLINK-4152: --- Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/2257#discussion_r71322624 --- Diff: flink-yarn/src/main/java/org/apache/flink/yarn/YarnFlinkResourceManager.java --- @@ -78,6 +79,9 @@ /** The containers where a TaskManager is starting and we are waiting for it to register */ private final Map containersInLaunch; + /** The container where a TaskManager has been started and is running in */ + private final Map containersLaunched; --- End diff -- It is true that it holds the registered resources but it does not hold the launched containers. When a `JobManager` loses its leadership the list of registered workers will be cleared. In order to reconstruct the mapping `ResourceID --> Container`, you need this new map. > TaskManager registration exponential backoff doesn't work > - > > Key: FLINK-4152 > URL: https://issues.apache.org/jira/browse/FLINK-4152 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination, TaskManager, YARN Client >Reporter: Robert Metzger >Assignee: Till Rohrmann > Attachments: logs.tgz > > > While testing Flink 1.1 I've found that the TaskManagers are logging many > messages when registering at the JobManager. > This is the log file: > https://gist.github.com/rmetzger/0cebe0419cdef4507b1e8a42e33ef294 > Its logging more than 3000 messages in less than a minute. I don't think that > this is the expected behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request #2257: [FLINK-4152] Allow re-registration of TMs at resou...
Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/2257#discussion_r71323898 --- Diff: flink-runtime/src/main/scala/org/apache/flink/runtime/jobmanager/JobManager.scala --- @@ -405,36 +374,13 @@ class JobManager( currentResourceManager match { case Some(rm) => - val future = (rm ? decorateMessage(new RegisterResource(taskManager, msg)))(timeout) - future.onComplete { -case scala.util.Success(response) => - // the resource manager is available and answered - self ! response -case scala.util.Failure(t) => - t match { -case _: TimeoutException => - log.info("Attempt to register resource at ResourceManager timed out. Retrying") -case _ => - log.warn("Failure while asking ResourceManager for RegisterResource. Retrying", t) - } - // slow or unreachable resource manager, register anyway and let the rm reconnect - self ! decorateMessage(new RegisterResourceSuccessful(taskManager, msg)) - self ! decorateMessage(new ReconnectResourceManager(rm)) - }(context.dispatcher) - + log.info(s"Register task manager $resourceId at the resource manager.") + rm ! decorateMessage(new RegisterResource(msg)) --- End diff -- If I'm not mistaken then there is hardly any difference between a registered worker and a container in launch. So in the current implementation it shouldn't matter much whether a container is in state "being launched" or "launched". Thus, it does not make much of a difference whether this message arrives or not. Given that the `JobManager` does not yet use the RM to allocate new resources, it might actually be a good idea to regard the RM as a tool to notify the JM about TM failures. Everything else can be added once we actually need it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-4152) TaskManager registration exponential backoff doesn't work
[ https://issues.apache.org/jira/browse/FLINK-4152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384014#comment-15384014 ] ASF GitHub Bot commented on FLINK-4152: --- Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/2257#discussion_r71323898 --- Diff: flink-runtime/src/main/scala/org/apache/flink/runtime/jobmanager/JobManager.scala --- @@ -405,36 +374,13 @@ class JobManager( currentResourceManager match { case Some(rm) => - val future = (rm ? decorateMessage(new RegisterResource(taskManager, msg)))(timeout) - future.onComplete { -case scala.util.Success(response) => - // the resource manager is available and answered - self ! response -case scala.util.Failure(t) => - t match { -case _: TimeoutException => - log.info("Attempt to register resource at ResourceManager timed out. Retrying") -case _ => - log.warn("Failure while asking ResourceManager for RegisterResource. Retrying", t) - } - // slow or unreachable resource manager, register anyway and let the rm reconnect - self ! decorateMessage(new RegisterResourceSuccessful(taskManager, msg)) - self ! decorateMessage(new ReconnectResourceManager(rm)) - }(context.dispatcher) - + log.info(s"Register task manager $resourceId at the resource manager.") + rm ! decorateMessage(new RegisterResource(msg)) --- End diff -- If I'm not mistaken then there is hardly any difference between a registered worker and a container in launch. So in the current implementation it shouldn't matter much whether a container is in state "being launched" or "launched". Thus, it does not make much of a difference whether this message arrives or not. Given that the `JobManager` does not yet use the RM to allocate new resources, it might actually be a good idea to regard the RM as a tool to notify the JM about TM failures. Everything else can be added once we actually need it. > TaskManager registration exponential backoff doesn't work > - > > Key: FLINK-4152 > URL: https://issues.apache.org/jira/browse/FLINK-4152 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination, TaskManager, YARN Client >Reporter: Robert Metzger >Assignee: Till Rohrmann > Attachments: logs.tgz > > > While testing Flink 1.1 I've found that the TaskManagers are logging many > messages when registering at the JobManager. > This is the log file: > https://gist.github.com/rmetzger/0cebe0419cdef4507b1e8a42e33ef294 > Its logging more than 3000 messages in less than a minute. I don't think that > this is the expected behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request #2257: [FLINK-4152] Allow re-registration of TMs at resou...
Github user mxm commented on a diff in the pull request: https://github.com/apache/flink/pull/2257#discussion_r71325133 --- Diff: flink-runtime/src/main/scala/org/apache/flink/runtime/jobmanager/JobManager.scala --- @@ -405,36 +374,13 @@ class JobManager( currentResourceManager match { case Some(rm) => - val future = (rm ? decorateMessage(new RegisterResource(taskManager, msg)))(timeout) - future.onComplete { -case scala.util.Success(response) => - // the resource manager is available and answered - self ! response -case scala.util.Failure(t) => - t match { -case _: TimeoutException => - log.info("Attempt to register resource at ResourceManager timed out. Retrying") -case _ => - log.warn("Failure while asking ResourceManager for RegisterResource. Retrying", t) - } - // slow or unreachable resource manager, register anyway and let the rm reconnect - self ! decorateMessage(new RegisterResourceSuccessful(taskManager, msg)) - self ! decorateMessage(new ReconnectResourceManager(rm)) - }(context.dispatcher) - + log.info(s"Register task manager $resourceId at the resource manager.") + rm ! decorateMessage(new RegisterResource(msg)) --- End diff -- I just don't understand why you remove this functionality. It was not broken in any way. Of course, we can always add features later (that is true for any component) but it changes the original RM design. If we want to add monitoring of the pool size later on, we will have to re-add the proper registration at the RM. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-4152) TaskManager registration exponential backoff doesn't work
[ https://issues.apache.org/jira/browse/FLINK-4152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384026#comment-15384026 ] ASF GitHub Bot commented on FLINK-4152: --- Github user mxm commented on a diff in the pull request: https://github.com/apache/flink/pull/2257#discussion_r71325133 --- Diff: flink-runtime/src/main/scala/org/apache/flink/runtime/jobmanager/JobManager.scala --- @@ -405,36 +374,13 @@ class JobManager( currentResourceManager match { case Some(rm) => - val future = (rm ? decorateMessage(new RegisterResource(taskManager, msg)))(timeout) - future.onComplete { -case scala.util.Success(response) => - // the resource manager is available and answered - self ! response -case scala.util.Failure(t) => - t match { -case _: TimeoutException => - log.info("Attempt to register resource at ResourceManager timed out. Retrying") -case _ => - log.warn("Failure while asking ResourceManager for RegisterResource. Retrying", t) - } - // slow or unreachable resource manager, register anyway and let the rm reconnect - self ! decorateMessage(new RegisterResourceSuccessful(taskManager, msg)) - self ! decorateMessage(new ReconnectResourceManager(rm)) - }(context.dispatcher) - + log.info(s"Register task manager $resourceId at the resource manager.") + rm ! decorateMessage(new RegisterResource(msg)) --- End diff -- I just don't understand why you remove this functionality. It was not broken in any way. Of course, we can always add features later (that is true for any component) but it changes the original RM design. If we want to add monitoring of the pool size later on, we will have to re-add the proper registration at the RM. > TaskManager registration exponential backoff doesn't work > - > > Key: FLINK-4152 > URL: https://issues.apache.org/jira/browse/FLINK-4152 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination, TaskManager, YARN Client >Reporter: Robert Metzger >Assignee: Till Rohrmann > Attachments: logs.tgz > > > While testing Flink 1.1 I've found that the TaskManagers are logging many > messages when registering at the JobManager. > This is the log file: > https://gist.github.com/rmetzger/0cebe0419cdef4507b1e8a42e33ef294 > Its logging more than 3000 messages in less than a minute. I don't think that > this is the expected behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4152) TaskManager registration exponential backoff doesn't work
[ https://issues.apache.org/jira/browse/FLINK-4152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384029#comment-15384029 ] ASF GitHub Bot commented on FLINK-4152: --- Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/2257#discussion_r71325200 --- Diff: flink-runtime/src/main/scala/org/apache/flink/runtime/jobmanager/JobManager.scala --- @@ -405,36 +374,13 @@ class JobManager( currentResourceManager match { case Some(rm) => - val future = (rm ? decorateMessage(new RegisterResource(taskManager, msg)))(timeout) - future.onComplete { -case scala.util.Success(response) => - // the resource manager is available and answered - self ! response -case scala.util.Failure(t) => - t match { -case _: TimeoutException => - log.info("Attempt to register resource at ResourceManager timed out. Retrying") -case _ => - log.warn("Failure while asking ResourceManager for RegisterResource. Retrying", t) - } - // slow or unreachable resource manager, register anyway and let the rm reconnect - self ! decorateMessage(new RegisterResourceSuccessful(taskManager, msg)) - self ! decorateMessage(new ReconnectResourceManager(rm)) - }(context.dispatcher) - + log.info(s"Register task manager $resourceId at the resource manager.") + rm ! decorateMessage(new RegisterResource(msg)) --- End diff -- But it might be necessary to add the number of launched containers to the number of containers in launch in the `YarnFlinkResourceManager#getNumWorkersPendingRegistration`. Then the `checkWorkersPool` should behave correctly. > TaskManager registration exponential backoff doesn't work > - > > Key: FLINK-4152 > URL: https://issues.apache.org/jira/browse/FLINK-4152 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination, TaskManager, YARN Client >Reporter: Robert Metzger >Assignee: Till Rohrmann > Attachments: logs.tgz > > > While testing Flink 1.1 I've found that the TaskManagers are logging many > messages when registering at the JobManager. > This is the log file: > https://gist.github.com/rmetzger/0cebe0419cdef4507b1e8a42e33ef294 > Its logging more than 3000 messages in less than a minute. I don't think that > this is the expected behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4152) TaskManager registration exponential backoff doesn't work
[ https://issues.apache.org/jira/browse/FLINK-4152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384030#comment-15384030 ] ASF GitHub Bot commented on FLINK-4152: --- Github user mxm commented on a diff in the pull request: https://github.com/apache/flink/pull/2257#discussion_r71325247 --- Diff: flink-yarn/src/main/java/org/apache/flink/yarn/YarnFlinkResourceManager.java --- @@ -78,6 +79,9 @@ /** The containers where a TaskManager is starting and we are waiting for it to register */ private final Map containersInLaunch; + /** The container where a TaskManager has been started and is running in */ + private final Map containersLaunched; --- End diff -- What would be the drawback of not cleaning this list? > TaskManager registration exponential backoff doesn't work > - > > Key: FLINK-4152 > URL: https://issues.apache.org/jira/browse/FLINK-4152 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination, TaskManager, YARN Client >Reporter: Robert Metzger >Assignee: Till Rohrmann > Attachments: logs.tgz > > > While testing Flink 1.1 I've found that the TaskManagers are logging many > messages when registering at the JobManager. > This is the log file: > https://gist.github.com/rmetzger/0cebe0419cdef4507b1e8a42e33ef294 > Its logging more than 3000 messages in less than a minute. I don't think that > this is the expected behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink issue #2265: [FLINK-3097] [table] Add support for custom functions in ...
Github user twalthr commented on the issue: https://github.com/apache/flink/pull/2265 I was also thinking a lot about the names, because we have currently many `Function`s in Flink. I chose `UserDefinedFunction` as the top-level function for all user-defined functions such as `ScalarFunction`, `TableFunction`, `AggregateFunction`, or what ever will come in future. If you have a look into the tests you will see that the Java API supports both: postfix and infix notation. So you can also call functions `hashCode(text)` if you like. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request #2257: [FLINK-4152] Allow re-registration of TMs at resou...
Github user mxm commented on a diff in the pull request: https://github.com/apache/flink/pull/2257#discussion_r71325247 --- Diff: flink-yarn/src/main/java/org/apache/flink/yarn/YarnFlinkResourceManager.java --- @@ -78,6 +79,9 @@ /** The containers where a TaskManager is starting and we are waiting for it to register */ private final Map containersInLaunch; + /** The container where a TaskManager has been started and is running in */ + private final Map containersLaunched; --- End diff -- What would be the drawback of not cleaning this list? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request #2257: [FLINK-4152] Allow re-registration of TMs at resou...
Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/2257#discussion_r71325200 --- Diff: flink-runtime/src/main/scala/org/apache/flink/runtime/jobmanager/JobManager.scala --- @@ -405,36 +374,13 @@ class JobManager( currentResourceManager match { case Some(rm) => - val future = (rm ? decorateMessage(new RegisterResource(taskManager, msg)))(timeout) - future.onComplete { -case scala.util.Success(response) => - // the resource manager is available and answered - self ! response -case scala.util.Failure(t) => - t match { -case _: TimeoutException => - log.info("Attempt to register resource at ResourceManager timed out. Retrying") -case _ => - log.warn("Failure while asking ResourceManager for RegisterResource. Retrying", t) - } - // slow or unreachable resource manager, register anyway and let the rm reconnect - self ! decorateMessage(new RegisterResourceSuccessful(taskManager, msg)) - self ! decorateMessage(new ReconnectResourceManager(rm)) - }(context.dispatcher) - + log.info(s"Register task manager $resourceId at the resource manager.") + rm ! decorateMessage(new RegisterResource(msg)) --- End diff -- But it might be necessary to add the number of launched containers to the number of containers in launch in the `YarnFlinkResourceManager#getNumWorkersPendingRegistration`. Then the `checkWorkersPool` should behave correctly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3097) Add support for custom functions in Table API
[ https://issues.apache.org/jira/browse/FLINK-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384031#comment-15384031 ] ASF GitHub Bot commented on FLINK-3097: --- Github user twalthr commented on the issue: https://github.com/apache/flink/pull/2265 I was also thinking a lot about the names, because we have currently many `Function`s in Flink. I chose `UserDefinedFunction` as the top-level function for all user-defined functions such as `ScalarFunction`, `TableFunction`, `AggregateFunction`, or what ever will come in future. If you have a look into the tests you will see that the Java API supports both: postfix and infix notation. So you can also call functions `hashCode(text)` if you like. > Add support for custom functions in Table API > - > > Key: FLINK-3097 > URL: https://issues.apache.org/jira/browse/FLINK-3097 > Project: Flink > Issue Type: New Feature > Components: Table API & SQL >Reporter: Timo Walther >Assignee: Timo Walther > > Currently, the Table API has a very limited set of built-in functions. > Support for custom functions can solve this problem. Adding of a custom row > function could look like: > {code} > TableEnvironment tableEnv = new TableEnvironment(); > RowFunction rf = new RowFunction() { > @Override > public String call(Object[] args) { > return ((String) args[0]).trim(); > } > }; > tableEnv.getConfig().registerRowFunction("TRIM", rf, > BasicTypeInfo.STRING_TYPE_INFO); > DataSource> input = env.fromElements( > new Tuple1<>(" 1 ")); > Table table = tableEnv.fromDataSet(input); > Table result = table.select("TRIM(f0)"); > {code} > This feature is also necessary as part of FLINK-2099. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4166) Generate automatic different namespaces in Zookeeper for Flink applications
[ https://issues.apache.org/jira/browse/FLINK-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384035#comment-15384035 ] ASF GitHub Bot commented on FLINK-4166: --- Github user mxm commented on the issue: https://github.com/apache/flink/pull/2249 Thank you! Merging after tests pass. > Generate automatic different namespaces in Zookeeper for Flink applications > --- > > Key: FLINK-4166 > URL: https://issues.apache.org/jira/browse/FLINK-4166 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination >Affects Versions: 1.0.3 >Reporter: Stefan Richter > > We should automatically generate different namespaces per Flink application > in Zookeeper to avoid interference between different applications that refer > to the same Zookeeper entries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink issue #2249: [FLINK-4166] [CLI] Generate different namespaces for Zook...
Github user mxm commented on the issue: https://github.com/apache/flink/pull/2249 Thank you! Merging after tests pass. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2078: [FLINK-2985] Allow different field names for unionAll() i...
Github user twalthr commented on the issue: https://github.com/apache/flink/pull/2078 Thanks for the contribution @gallenvara. I will merge it now... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2985) Allow different field names for unionAll() in Table API
[ https://issues.apache.org/jira/browse/FLINK-2985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384040#comment-15384040 ] ASF GitHub Bot commented on FLINK-2985: --- Github user twalthr commented on the issue: https://github.com/apache/flink/pull/2078 Thanks for the contribution @gallenvara. I will merge it now... > Allow different field names for unionAll() in Table API > --- > > Key: FLINK-2985 > URL: https://issues.apache.org/jira/browse/FLINK-2985 > Project: Flink > Issue Type: Improvement > Components: Table API & SQL >Reporter: Timo Walther >Priority: Minor > > The recently merged `unionAll` operator checks if the field names of the left > and right side are equal. Actually, this is not necessary. The union operator > in SQL checks only the types and uses the names of left side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request #2078: [FLINK-2985] Allow different field names for union...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/2078 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2985) Allow different field names for unionAll() in Table API
[ https://issues.apache.org/jira/browse/FLINK-2985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384046#comment-15384046 ] ASF GitHub Bot commented on FLINK-2985: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/2078 > Allow different field names for unionAll() in Table API > --- > > Key: FLINK-2985 > URL: https://issues.apache.org/jira/browse/FLINK-2985 > Project: Flink > Issue Type: Improvement > Components: Table API & SQL >Reporter: Timo Walther >Priority: Minor > > The recently merged `unionAll` operator checks if the field names of the left > and right side are equal. Actually, this is not necessary. The union operator > in SQL checks only the types and uses the names of left side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-2985) Allow different field names for unionAll() in Table API
[ https://issues.apache.org/jira/browse/FLINK-2985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timo Walther resolved FLINK-2985. - Resolution: Fixed Fix Version/s: 1.1.0 Fixed in 5758c91999efcf45457e4c25d4a75b13ce13e486. > Allow different field names for unionAll() in Table API > --- > > Key: FLINK-2985 > URL: https://issues.apache.org/jira/browse/FLINK-2985 > Project: Flink > Issue Type: Improvement > Components: Table API & SQL >Reporter: Timo Walther >Priority: Minor > Fix For: 1.1.0 > > > The recently merged `unionAll` operator checks if the field names of the left > and right side are equal. Actually, this is not necessary. The union operator > in SQL checks only the types and uses the names of left side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink issue #2257: [FLINK-4152] Allow re-registration of TMs at resource man...
Github user tillrohrmann commented on the issue: https://github.com/apache/flink/pull/2257 I think by imposing this contract we actually introduced problems we didn't have before. Thus, in order to remedy these problems for the upcoming release and go back a bit in the direction of the pre-RM age, I loosened the contract of the RM so that it can no longer reject TM registrations. This makes sense in my opinion, since we don't have a mean to shut down orphaned TMs anyway. So in this version, the RM's task is to ensure that at least a predefined set of resources is allocated and to notify the JM about a TM death (not strictly mandatory). What we could actually do is to also register orphaned TMs (or ones registered by a different RM). Then we wouldn't have the problem that we allocate too many resources for a JM. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-4152) TaskManager registration exponential backoff doesn't work
[ https://issues.apache.org/jira/browse/FLINK-4152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384069#comment-15384069 ] ASF GitHub Bot commented on FLINK-4152: --- Github user tillrohrmann commented on the issue: https://github.com/apache/flink/pull/2257 I think by imposing this contract we actually introduced problems we didn't have before. Thus, in order to remedy these problems for the upcoming release and go back a bit in the direction of the pre-RM age, I loosened the contract of the RM so that it can no longer reject TM registrations. This makes sense in my opinion, since we don't have a mean to shut down orphaned TMs anyway. So in this version, the RM's task is to ensure that at least a predefined set of resources is allocated and to notify the JM about a TM death (not strictly mandatory). What we could actually do is to also register orphaned TMs (or ones registered by a different RM). Then we wouldn't have the problem that we allocate too many resources for a JM. > TaskManager registration exponential backoff doesn't work > - > > Key: FLINK-4152 > URL: https://issues.apache.org/jira/browse/FLINK-4152 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination, TaskManager, YARN Client >Reporter: Robert Metzger >Assignee: Till Rohrmann > Attachments: logs.tgz > > > While testing Flink 1.1 I've found that the TaskManagers are logging many > messages when registering at the JobManager. > This is the log file: > https://gist.github.com/rmetzger/0cebe0419cdef4507b1e8a42e33ef294 > Its logging more than 3000 messages in less than a minute. I don't think that > this is the expected behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request #2257: [FLINK-4152] Allow re-registration of TMs at resou...
Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/2257#discussion_r71327721 --- Diff: flink-runtime/src/main/scala/org/apache/flink/runtime/jobmanager/JobManager.scala --- @@ -405,36 +374,13 @@ class JobManager( currentResourceManager match { case Some(rm) => - val future = (rm ? decorateMessage(new RegisterResource(taskManager, msg)))(timeout) - future.onComplete { -case scala.util.Success(response) => - // the resource manager is available and answered - self ! response -case scala.util.Failure(t) => - t match { -case _: TimeoutException => - log.info("Attempt to register resource at ResourceManager timed out. Retrying") -case _ => - log.warn("Failure while asking ResourceManager for RegisterResource. Retrying", t) - } - // slow or unreachable resource manager, register anyway and let the rm reconnect - self ! decorateMessage(new RegisterResourceSuccessful(taskManager, msg)) - self ! decorateMessage(new ReconnectResourceManager(rm)) - }(context.dispatcher) - + log.info(s"Register task manager $resourceId at the resource manager.") + rm ! decorateMessage(new RegisterResource(msg)) --- End diff -- Because it added complexity which is no longer needed. The current design still allows to change it later on. Therefore I don't see a problem in removing it. Just because we might need it in the future, is imho not a good reason to keep unused code around. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-4152) TaskManager registration exponential backoff doesn't work
[ https://issues.apache.org/jira/browse/FLINK-4152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384077#comment-15384077 ] ASF GitHub Bot commented on FLINK-4152: --- Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/2257#discussion_r71327721 --- Diff: flink-runtime/src/main/scala/org/apache/flink/runtime/jobmanager/JobManager.scala --- @@ -405,36 +374,13 @@ class JobManager( currentResourceManager match { case Some(rm) => - val future = (rm ? decorateMessage(new RegisterResource(taskManager, msg)))(timeout) - future.onComplete { -case scala.util.Success(response) => - // the resource manager is available and answered - self ! response -case scala.util.Failure(t) => - t match { -case _: TimeoutException => - log.info("Attempt to register resource at ResourceManager timed out. Retrying") -case _ => - log.warn("Failure while asking ResourceManager for RegisterResource. Retrying", t) - } - // slow or unreachable resource manager, register anyway and let the rm reconnect - self ! decorateMessage(new RegisterResourceSuccessful(taskManager, msg)) - self ! decorateMessage(new ReconnectResourceManager(rm)) - }(context.dispatcher) - + log.info(s"Register task manager $resourceId at the resource manager.") + rm ! decorateMessage(new RegisterResource(msg)) --- End diff -- Because it added complexity which is no longer needed. The current design still allows to change it later on. Therefore I don't see a problem in removing it. Just because we might need it in the future, is imho not a good reason to keep unused code around. > TaskManager registration exponential backoff doesn't work > - > > Key: FLINK-4152 > URL: https://issues.apache.org/jira/browse/FLINK-4152 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination, TaskManager, YARN Client >Reporter: Robert Metzger >Assignee: Till Rohrmann > Attachments: logs.tgz > > > While testing Flink 1.1 I've found that the TaskManagers are logging many > messages when registering at the JobManager. > This is the log file: > https://gist.github.com/rmetzger/0cebe0419cdef4507b1e8a42e33ef294 > Its logging more than 3000 messages in less than a minute. I don't think that > this is the expected behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2491) Operators are not participating in state checkpointing in some cases
[ https://issues.apache.org/jira/browse/FLINK-2491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384081#comment-15384081 ] Valentin Denisenkov commented on FLINK-2491: Can you please give an ETA for this issue? > Operators are not participating in state checkpointing in some cases > > > Key: FLINK-2491 > URL: https://issues.apache.org/jira/browse/FLINK-2491 > Project: Flink > Issue Type: Bug > Components: Streaming >Affects Versions: 0.10.0 >Reporter: Robert Metzger >Assignee: Márton Balassi >Priority: Critical > Fix For: 1.0.0 > > > While implementing a test case for the Kafka Consumer, I came across the > following bug: > Consider the following topology, with the operator parallelism in parentheses: > Source (2) --> Sink (1). > In this setup, the {{snapshotState()}} method is called on the source, but > not on the Sink. > The sink receives the generated data. > The only one of the two sources is generating data. > I've implemented a test case for this, you can find it here: > https://github.com/rmetzger/flink/blob/para_checkpoint_bug/flink-tests/src/test/java/org/apache/flink/test/checkpointing/ParallelismChangeCheckpoinedITCase.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink issue #2257: [FLINK-4152] Allow re-registration of TMs at resource man...
Github user mxm commented on the issue: https://github.com/apache/flink/pull/2257 I don't feel particularly great about breaking the contract between JobManager and ResourceManager but it is a path we can go until we expand the ResourceManager capabilities. The figures here would have to be updated as well: https://issues.apache.org/jira/browse/FLINK-3543 Overall, the changes look good to me. I don't think we can reach consensus on the registration matter. As of now, I don't want to block release fixes. So +1 to merge if you feel like it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-4152) TaskManager registration exponential backoff doesn't work
[ https://issues.apache.org/jira/browse/FLINK-4152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384094#comment-15384094 ] ASF GitHub Bot commented on FLINK-4152: --- Github user mxm commented on the issue: https://github.com/apache/flink/pull/2257 I don't feel particularly great about breaking the contract between JobManager and ResourceManager but it is a path we can go until we expand the ResourceManager capabilities. The figures here would have to be updated as well: https://issues.apache.org/jira/browse/FLINK-3543 Overall, the changes look good to me. I don't think we can reach consensus on the registration matter. As of now, I don't want to block release fixes. So +1 to merge if you feel like it. > TaskManager registration exponential backoff doesn't work > - > > Key: FLINK-4152 > URL: https://issues.apache.org/jira/browse/FLINK-4152 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination, TaskManager, YARN Client >Reporter: Robert Metzger >Assignee: Till Rohrmann > Attachments: logs.tgz > > > While testing Flink 1.1 I've found that the TaskManagers are logging many > messages when registering at the JobManager. > This is the log file: > https://gist.github.com/rmetzger/0cebe0419cdef4507b1e8a42e33ef294 > Its logging more than 3000 messages in less than a minute. I don't think that > this is the expected behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request #2257: [FLINK-4152] Allow re-registration of TMs at resou...
Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/2257#discussion_r71331665 --- Diff: flink-yarn/src/main/java/org/apache/flink/yarn/YarnFlinkResourceManager.java --- @@ -78,6 +79,9 @@ /** The containers where a TaskManager is starting and we are waiting for it to register */ private final Map containersInLaunch; + /** The container where a TaskManager has been started and is running in */ + private final Map containersLaunched; --- End diff -- You tell me, since you've authored this component. I guess there was a reason why you clear the list of registered resources when a JM loses its leadership. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request #2257: [FLINK-4152] Allow re-registration of TMs at resou...
Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/2257#discussion_r71331743 --- Diff: flink-yarn/src/main/java/org/apache/flink/yarn/YarnFlinkResourceManager.java --- @@ -78,6 +79,9 @@ /** The containers where a TaskManager is starting and we are waiting for it to register */ private final Map containersInLaunch; + /** The container where a TaskManager has been started and is running in */ + private final Map containersLaunched; --- End diff -- I basically introduced this map to not break the semantic contract of the `FlinkResourceManager` component. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-4152) TaskManager registration exponential backoff doesn't work
[ https://issues.apache.org/jira/browse/FLINK-4152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384104#comment-15384104 ] ASF GitHub Bot commented on FLINK-4152: --- Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/2257#discussion_r71331743 --- Diff: flink-yarn/src/main/java/org/apache/flink/yarn/YarnFlinkResourceManager.java --- @@ -78,6 +79,9 @@ /** The containers where a TaskManager is starting and we are waiting for it to register */ private final Map containersInLaunch; + /** The container where a TaskManager has been started and is running in */ + private final Map containersLaunched; --- End diff -- I basically introduced this map to not break the semantic contract of the `FlinkResourceManager` component. > TaskManager registration exponential backoff doesn't work > - > > Key: FLINK-4152 > URL: https://issues.apache.org/jira/browse/FLINK-4152 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination, TaskManager, YARN Client >Reporter: Robert Metzger >Assignee: Till Rohrmann > Attachments: logs.tgz > > > While testing Flink 1.1 I've found that the TaskManagers are logging many > messages when registering at the JobManager. > This is the log file: > https://gist.github.com/rmetzger/0cebe0419cdef4507b1e8a42e33ef294 > Its logging more than 3000 messages in less than a minute. I don't think that > this is the expected behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4152) TaskManager registration exponential backoff doesn't work
[ https://issues.apache.org/jira/browse/FLINK-4152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384103#comment-15384103 ] ASF GitHub Bot commented on FLINK-4152: --- Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/2257#discussion_r71331665 --- Diff: flink-yarn/src/main/java/org/apache/flink/yarn/YarnFlinkResourceManager.java --- @@ -78,6 +79,9 @@ /** The containers where a TaskManager is starting and we are waiting for it to register */ private final Map containersInLaunch; + /** The container where a TaskManager has been started and is running in */ + private final Map containersLaunched; --- End diff -- You tell me, since you've authored this component. I guess there was a reason why you clear the list of registered resources when a JM loses its leadership. > TaskManager registration exponential backoff doesn't work > - > > Key: FLINK-4152 > URL: https://issues.apache.org/jira/browse/FLINK-4152 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination, TaskManager, YARN Client >Reporter: Robert Metzger >Assignee: Till Rohrmann > Attachments: logs.tgz > > > While testing Flink 1.1 I've found that the TaskManagers are logging many > messages when registering at the JobManager. > This is the log file: > https://gist.github.com/rmetzger/0cebe0419cdef4507b1e8a42e33ef294 > Its logging more than 3000 messages in less than a minute. I don't think that > this is the expected behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-4233) Simplify leader election / leader session ID assignment
Stephan Ewen created FLINK-4233: --- Summary: Simplify leader election / leader session ID assignment Key: FLINK-4233 URL: https://issues.apache.org/jira/browse/FLINK-4233 Project: Flink Issue Type: Improvement Components: Distributed Coordination Affects Versions: 1.0.3 Reporter: Stephan Ewen Currently, there are two separate actions and znodes involved in leader election and communication of the leader session ID and leader URL. This leads to some quite elaborate code that tries to make sure that the leader session ID and leader URL always eventually converge to those of the leader. It is simpler to just encode both the ID and the URL into an id-string that is attached to the leader latch znode. One would have to create a new leader latch each time a contender re-applies for leadership. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink issue #2249: [FLINK-4166] [CLI] Generate different namespaces for Zook...
Github user mxm commented on the issue: https://github.com/apache/flink/pull/2249 Travis cache seems to be corrupted. Have you run `mvn verify` from the command-line? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-4166) Generate automatic different namespaces in Zookeeper for Flink applications
[ https://issues.apache.org/jira/browse/FLINK-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384105#comment-15384105 ] ASF GitHub Bot commented on FLINK-4166: --- Github user mxm commented on the issue: https://github.com/apache/flink/pull/2249 Travis cache seems to be corrupted. Have you run `mvn verify` from the command-line? > Generate automatic different namespaces in Zookeeper for Flink applications > --- > > Key: FLINK-4166 > URL: https://issues.apache.org/jira/browse/FLINK-4166 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination >Affects Versions: 1.0.3 >Reporter: Stefan Richter > > We should automatically generate different namespaces per Flink application > in Zookeeper to avoid interference between different applications that refer > to the same Zookeeper entries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink issue #2182: [Flink-4130] CallGenerator could generate illegal code wh...
Github user twalthr commented on the issue: https://github.com/apache/flink/pull/2182 Thanks for the contribution @unsleepy22. I changed the code a little bit, so that we can prevent an `if(false)` branch. Will merge... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request #2182: [Flink-4130] CallGenerator could generate illegal ...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/2182 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request #2149: [FLINK-4084] Add configDir parameter to CliFronten...
Github user mxm commented on a diff in the pull request: https://github.com/apache/flink/pull/2149#discussion_r71332501 --- Diff: docs/apis/cli.md --- @@ -187,6 +187,8 @@ Action "run" compiles and runs a program. java.net.URLClassLoader}. -d,--detachedIf present, runs the job in detached mode +--configDir The configuration directory with which --- End diff -- I think there is a tab, we use only spaces for formatting here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-4084) Add configDir parameter to CliFrontend and flink shell script
[ https://issues.apache.org/jira/browse/FLINK-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384109#comment-15384109 ] ASF GitHub Bot commented on FLINK-4084: --- Github user mxm commented on a diff in the pull request: https://github.com/apache/flink/pull/2149#discussion_r71332501 --- Diff: docs/apis/cli.md --- @@ -187,6 +187,8 @@ Action "run" compiles and runs a program. java.net.URLClassLoader}. -d,--detachedIf present, runs the job in detached mode +--configDir The configuration directory with which --- End diff -- I think there is a tab, we use only spaces for formatting here. > Add configDir parameter to CliFrontend and flink shell script > - > > Key: FLINK-4084 > URL: https://issues.apache.org/jira/browse/FLINK-4084 > Project: Flink > Issue Type: Improvement > Components: Client >Affects Versions: 1.1.0 >Reporter: Till Rohrmann >Assignee: Andrea Sella >Priority: Minor > > At the moment there is no other way than the environment variable > FLINK_CONF_DIR to specify the configuration directory for the CliFrontend if > it is started via the flink shell script. In order to improve the user > exprience, I would propose to introduce a {{--configDir}} parameter which the > user can use to specify a configuration directory more easily. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink issue #2265: [FLINK-3097] [table] Add support for custom functions in ...
Github user aljoscha commented on the issue: https://github.com/apache/flink/pull/2265 Ah ok, thats perfect! (about infix and postfix) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3097) Add support for custom functions in Table API
[ https://issues.apache.org/jira/browse/FLINK-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384111#comment-15384111 ] ASF GitHub Bot commented on FLINK-3097: --- Github user aljoscha commented on the issue: https://github.com/apache/flink/pull/2265 Ah ok, thats perfect! (about infix and postfix) > Add support for custom functions in Table API > - > > Key: FLINK-3097 > URL: https://issues.apache.org/jira/browse/FLINK-3097 > Project: Flink > Issue Type: New Feature > Components: Table API & SQL >Reporter: Timo Walther >Assignee: Timo Walther > > Currently, the Table API has a very limited set of built-in functions. > Support for custom functions can solve this problem. Adding of a custom row > function could look like: > {code} > TableEnvironment tableEnv = new TableEnvironment(); > RowFunction rf = new RowFunction() { > @Override > public String call(Object[] args) { > return ((String) args[0]).trim(); > } > }; > tableEnv.getConfig().registerRowFunction("TRIM", rf, > BasicTypeInfo.STRING_TYPE_INFO); > DataSource> input = env.fromElements( > new Tuple1<>(" 1 ")); > Table table = tableEnv.fromDataSet(input); > Table result = table.select("TRIM(f0)"); > {code} > This feature is also necessary as part of FLINK-2099. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2491) Operators are not participating in state checkpointing in some cases
[ https://issues.apache.org/jira/browse/FLINK-2491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384113#comment-15384113 ] Aljoscha Krettek commented on FLINK-2491: - I think we want to tackle this for the 1.2 release, which should happen roughly 3 months after 1.1 is out. We're currently in the last stages of releasing 1.1. > Operators are not participating in state checkpointing in some cases > > > Key: FLINK-2491 > URL: https://issues.apache.org/jira/browse/FLINK-2491 > Project: Flink > Issue Type: Bug > Components: Streaming >Affects Versions: 0.10.0 >Reporter: Robert Metzger >Assignee: Márton Balassi >Priority: Critical > Fix For: 1.0.0 > > > While implementing a test case for the Kafka Consumer, I came across the > following bug: > Consider the following topology, with the operator parallelism in parentheses: > Source (2) --> Sink (1). > In this setup, the {{snapshotState()}} method is called on the source, but > not on the Sink. > The sink receives the generated data. > The only one of the two sources is generating data. > I've implemented a test case for this, you can find it here: > https://github.com/rmetzger/flink/blob/para_checkpoint_bug/flink-tests/src/test/java/org/apache/flink/test/checkpointing/ParallelismChangeCheckpoinedITCase.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-4130) CallGenerator could generate illegal code when taking no operands
[ https://issues.apache.org/jira/browse/FLINK-4130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timo Walther resolved FLINK-4130. - Resolution: Fixed Fix Version/s: 1.1.0 Fixed in dd53831aa00e8160e8db14d6de186ce8f1f82b92. > CallGenerator could generate illegal code when taking no operands > - > > Key: FLINK-4130 > URL: https://issues.apache.org/jira/browse/FLINK-4130 > Project: Flink > Issue Type: Bug > Components: Table API & SQL >Affects Versions: 1.1.0 >Reporter: Cody >Priority: Minor > Fix For: 1.1.0 > > > In CallGenerator, when a call takes no operands, and null check is enabled, > it will generate code like: > boolean isNull$17 = ; > which will fail to compile at runtime. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3097) Add support for custom functions in Table API
[ https://issues.apache.org/jira/browse/FLINK-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384115#comment-15384115 ] ASF GitHub Bot commented on FLINK-3097: --- Github user wuchong commented on the issue: https://github.com/apache/flink/pull/2265 Yes, I see. That's great! > Add support for custom functions in Table API > - > > Key: FLINK-3097 > URL: https://issues.apache.org/jira/browse/FLINK-3097 > Project: Flink > Issue Type: New Feature > Components: Table API & SQL >Reporter: Timo Walther >Assignee: Timo Walther > > Currently, the Table API has a very limited set of built-in functions. > Support for custom functions can solve this problem. Adding of a custom row > function could look like: > {code} > TableEnvironment tableEnv = new TableEnvironment(); > RowFunction rf = new RowFunction() { > @Override > public String call(Object[] args) { > return ((String) args[0]).trim(); > } > }; > tableEnv.getConfig().registerRowFunction("TRIM", rf, > BasicTypeInfo.STRING_TYPE_INFO); > DataSource> input = env.fromElements( > new Tuple1<>(" 1 ")); > Table table = tableEnv.fromDataSet(input); > Table result = table.select("TRIM(f0)"); > {code} > This feature is also necessary as part of FLINK-2099. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink issue #2265: [FLINK-3097] [table] Add support for custom functions in ...
Github user wuchong commented on the issue: https://github.com/apache/flink/pull/2265 Yes, I see. That's great! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2257: [FLINK-4152] Allow re-registration of TMs at resource man...
Github user tillrohrmann commented on the issue: https://github.com/apache/flink/pull/2257 My main concern is actually that the JM-RM interaction is not well tested. Thus, I fear the more complex the code is the more possibility for mistakes there are. For example, I couldn't find a test where we test the `ReconnectResourceManager` functionality. Given that we're about to release 1.1 shortly, I would go for the simplest approach. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-4152) TaskManager registration exponential backoff doesn't work
[ https://issues.apache.org/jira/browse/FLINK-4152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384133#comment-15384133 ] ASF GitHub Bot commented on FLINK-4152: --- Github user tillrohrmann commented on the issue: https://github.com/apache/flink/pull/2257 My main concern is actually that the JM-RM interaction is not well tested. Thus, I fear the more complex the code is the more possibility for mistakes there are. For example, I couldn't find a test where we test the `ReconnectResourceManager` functionality. Given that we're about to release 1.1 shortly, I would go for the simplest approach. > TaskManager registration exponential backoff doesn't work > - > > Key: FLINK-4152 > URL: https://issues.apache.org/jira/browse/FLINK-4152 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination, TaskManager, YARN Client >Reporter: Robert Metzger >Assignee: Till Rohrmann > Attachments: logs.tgz > > > While testing Flink 1.1 I've found that the TaskManagers are logging many > messages when registering at the JobManager. > This is the log file: > https://gist.github.com/rmetzger/0cebe0419cdef4507b1e8a42e33ef294 > Its logging more than 3000 messages in less than a minute. I don't think that > this is the expected behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request #2269: [FLINK-4190] Generalise RollingSink to work with a...
GitHub user joshfg opened a pull request: https://github.com/apache/flink/pull/2269 [FLINK-4190] Generalise RollingSink to work with arbitrary buckets I've created a new bucketing package with a BucketingSink, which improves on the existing RollingSink by enabling arbitrary bucketing, rather than just rolling files based on system time. The main changes to support this are: - The Bucketer interface now takes the sink's input element as a generic parameter, enabling us to bucket based on attributes of the sink's input. - While maintaining the same rolling mechanics of the existing implementation (e.g. rolling when the file size reaches a threshold), the sink implementation can now have many 'active' buckets at any point in time. The checkpointing mechanics have been extended to support maintaining the state of multiple active buckets and files, instead of just one. - For use cases where the buckets being written to are changing over time, the sink now needs to determine when a bucket has become 'inactive', in order to flush and close the file. In the existing implementation, this is simply when the bucket path changes. Instead, we now determine a bucket as inactive if it hasn't been written to recently. To support this there are two additional user configurable settings: inactiveBucketCheckInterval and inactiveBucketThreshold. You can merge this pull request into a Git repository by running: $ git pull https://github.com/joshfg/flink flink-4190 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/2269.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2269 commit 2011e47de6c8b3c087772c84b4b3e44210dbe50c Author: Josh Date: 2016-07-12T17:38:54Z [FLINK-4190] Generalise RollingSink to work with arbitrary buckets --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-4190) Generalise RollingSink to work with arbitrary buckets
[ https://issues.apache.org/jira/browse/FLINK-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384206#comment-15384206 ] Josh Forman-Gornall commented on FLINK-4190: Ok cool, thanks! I've just submitted a pull request. > Generalise RollingSink to work with arbitrary buckets > - > > Key: FLINK-4190 > URL: https://issues.apache.org/jira/browse/FLINK-4190 > Project: Flink > Issue Type: Improvement > Components: filesystem-connector, Streaming Connectors >Reporter: Josh Forman-Gornall >Assignee: Josh Forman-Gornall >Priority: Minor > > The current RollingSink implementation appears to be intended for writing to > directories that are bucketed by system time (e.g. minutely) and to only be > writing to one file within one bucket at any point in time. When the system > time determines that the current bucket should be changed, the current bucket > and file are closed and a new bucket and file are created. The sink cannot be > used for the more general problem of writing to arbitrary buckets, perhaps > determined by an attribute on the element/tuple being processed. > There are three limitations which prevent the existing sink from being used > for more general problems: > - Only bucketing by the current system time is supported, and not by e.g. an > attribute of the element being processed by the sink. > - Whenever the sink sees a change in the bucket being written to, it flushes > the file and moves on to the new bucket. Therefore the sink cannot have more > than one bucket/file open at a time. Additionally the checkpointing mechanics > only support saving the state of one active bucket and file. > - The sink determines that it should 'close' an active bucket and file when > the bucket path changes. We need another way to determine when a bucket has > become inactive and needs to be closed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request #2257: [FLINK-4152] Allow re-registration of TMs at resou...
Github user mxm commented on a diff in the pull request: https://github.com/apache/flink/pull/2257#discussion_r71350758 --- Diff: flink-yarn/src/main/java/org/apache/flink/yarn/YarnFlinkResourceManager.java --- @@ -78,6 +79,9 @@ /** The containers where a TaskManager is starting and we are waiting for it to register */ private final Map containersInLaunch; + /** The container where a TaskManager has been started and is running in */ + private final Map containersLaunched; --- End diff -- >You tell me, since you've authored this component. I think there is a misunderstanding. I did author this component but it is not mine. I know you spend some time working on this and I just want to understand your motives. I thought actually you had tried this fix since you mentioned it in the JIRA. I think clearing the list is simply a bug and artifact of an old code design where the ResourceManager would be the central instance for TaskManager registration. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-4230) Session Windowing IT Case
[ https://issues.apache.org/jira/browse/FLINK-4230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384242#comment-15384242 ] ASF GitHub Bot commented on FLINK-4230: --- Github user aljoscha commented on the issue: https://github.com/apache/flink/pull/2263 Nice pice of code! I finally understood how it works... 😃 Some remarks about the code: in some places there are method names that seem to stem from an initial implementation but don't match the current code anymore. For example, `SessionEventGeneratorDataSource.createTestStream()` returns a "generator" so it could be called `createGenerator()`. Also, there are some unused methods (for example in `EventGeneratorFactory`) and methods with generated Javadoc that don't have any actual content. Could you please have another pass over the code and remove the unused methods and remove or fix the Javadoc. Some of the classes could also use a class-level Javadoc. In `SessionEventGeneratorImpl`, the name `generateLateTimestamp()` might be a bit misleading. It just creates timestamps in the range of allowed timestamps. Both `InLatenessGenerator` and `AfterLatenessGenerator` use the method in the same way, just the behavior of `canGenerateEventAtWatermark()` determines whether the generated elements will be late or not. Here, a good comment on `canGenerateEventAtWatermark()` might help on the base interface. Also, it might make sense to make the testing source non-parallel. If we have parallelism 2 and one source regularly advances the watermark but the other source never advances the watermark the elements that are generated as "late" by the first source are not considered late at the window operator because the watermark at the window operator cannot advance. > Session Windowing IT Case > - > > Key: FLINK-4230 > URL: https://issues.apache.org/jira/browse/FLINK-4230 > Project: Flink > Issue Type: Test > Components: DataStream API, Local Runtime >Reporter: Stefan Richter >Assignee: Stefan Richter > > An ITCase for Session Windows is missing that tests correct behavior under > several parallel sessions, with timely events, late events within and after > the lateness interval. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink issue #2263: [FLINK-4230] [DataStreamAPI] Add Session Windowing ITCase
Github user aljoscha commented on the issue: https://github.com/apache/flink/pull/2263 Nice pice of code! I finally understood how it works... 😃 Some remarks about the code: in some places there are method names that seem to stem from an initial implementation but don't match the current code anymore. For example, `SessionEventGeneratorDataSource.createTestStream()` returns a "generator" so it could be called `createGenerator()`. Also, there are some unused methods (for example in `EventGeneratorFactory`) and methods with generated Javadoc that don't have any actual content. Could you please have another pass over the code and remove the unused methods and remove or fix the Javadoc. Some of the classes could also use a class-level Javadoc. In `SessionEventGeneratorImpl`, the name `generateLateTimestamp()` might be a bit misleading. It just creates timestamps in the range of allowed timestamps. Both `InLatenessGenerator` and `AfterLatenessGenerator` use the method in the same way, just the behavior of `canGenerateEventAtWatermark()` determines whether the generated elements will be late or not. Here, a good comment on `canGenerateEventAtWatermark()` might help on the base interface. Also, it might make sense to make the testing source non-parallel. If we have parallelism 2 and one source regularly advances the watermark but the other source never advances the watermark the elements that are generated as "late" by the first source are not considered late at the window operator because the watermark at the window operator cannot advance. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Resolved] (FLINK-3792) RowTypeInfo equality should not depend on field names
[ https://issues.apache.org/jira/browse/FLINK-3792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timo Walther resolved FLINK-3792. - Resolution: Fixed Fix Version/s: 1.1.0 Fixed as part of FLINK-2985. > RowTypeInfo equality should not depend on field names > - > > Key: FLINK-3792 > URL: https://issues.apache.org/jira/browse/FLINK-3792 > Project: Flink > Issue Type: Bug > Components: Table API & SQL >Affects Versions: 1.1.0 >Reporter: Vasia Kalavri > Fix For: 1.1.0 > > > Currently, two Rows with the same field types but different field names are > not considered equal by the Table API and SQL. This behavior might create > problems, e.g. it makes the following union query fail: > {code} > SELECT STREAM a, b, c FROM T1 UNION ALL > (SELECT STREAM d, e, f FROM T2 WHERE d < 3) > {code} > where a, b, c and d, e, f are fields of corresponding types. > {code} > Cannot union streams of different types: org.apache.flink.api.table.Row(a: > Integer, b: Long, c: String) and org.apache.flink.api.table.Row(d: Integer, > e: Long, f: String) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4183) Move checking for StreamTableEnvironment into validation layer
[ https://issues.apache.org/jira/browse/FLINK-4183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384245#comment-15384245 ] ASF GitHub Bot commented on FLINK-4183: --- Github user twalthr commented on the issue: https://github.com/apache/flink/pull/2221 Merging... > Move checking for StreamTableEnvironment into validation layer > -- > > Key: FLINK-4183 > URL: https://issues.apache.org/jira/browse/FLINK-4183 > Project: Flink > Issue Type: Improvement > Components: Table API & SQL >Reporter: Timo Walther >Assignee: Timo Walther >Priority: Minor > > Some operators check the environment in `table.scala` instead of doing this > during the valdation phase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink issue #2221: [FLINK-4183] [table] Move checking for StreamTableEnviron...
Github user twalthr commented on the issue: https://github.com/apache/flink/pull/2221 Merging... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-4190) Generalise RollingSink to work with arbitrary buckets
[ https://issues.apache.org/jira/browse/FLINK-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384250#comment-15384250 ] Aljoscha Krettek commented on FLINK-4190: - Did you open the PR with the correct FLINK-4190 tag? Normally, it should show up here. Could you post a link to the PR? > Generalise RollingSink to work with arbitrary buckets > - > > Key: FLINK-4190 > URL: https://issues.apache.org/jira/browse/FLINK-4190 > Project: Flink > Issue Type: Improvement > Components: filesystem-connector, Streaming Connectors >Reporter: Josh Forman-Gornall >Assignee: Josh Forman-Gornall >Priority: Minor > > The current RollingSink implementation appears to be intended for writing to > directories that are bucketed by system time (e.g. minutely) and to only be > writing to one file within one bucket at any point in time. When the system > time determines that the current bucket should be changed, the current bucket > and file are closed and a new bucket and file are created. The sink cannot be > used for the more general problem of writing to arbitrary buckets, perhaps > determined by an attribute on the element/tuple being processed. > There are three limitations which prevent the existing sink from being used > for more general problems: > - Only bucketing by the current system time is supported, and not by e.g. an > attribute of the element being processed by the sink. > - Whenever the sink sees a change in the bucket being written to, it flushes > the file and moves on to the new bucket. Therefore the sink cannot have more > than one bucket/file open at a time. Additionally the checkpointing mechanics > only support saving the state of one active bucket and file. > - The sink determines that it should 'close' an active bucket and file when > the bucket path changes. We need another way to determine when a bucket has > become inactive and needs to be closed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request #2221: [FLINK-4183] [table] Move checking for StreamTable...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/2221 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-4183) Move checking for StreamTableEnvironment into validation layer
[ https://issues.apache.org/jira/browse/FLINK-4183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384251#comment-15384251 ] ASF GitHub Bot commented on FLINK-4183: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/2221 > Move checking for StreamTableEnvironment into validation layer > -- > > Key: FLINK-4183 > URL: https://issues.apache.org/jira/browse/FLINK-4183 > Project: Flink > Issue Type: Improvement > Components: Table API & SQL >Reporter: Timo Walther >Assignee: Timo Walther >Priority: Minor > > Some operators check the environment in `table.scala` instead of doing this > during the valdation phase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4152) TaskManager registration exponential backoff doesn't work
[ https://issues.apache.org/jira/browse/FLINK-4152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384234#comment-15384234 ] ASF GitHub Bot commented on FLINK-4152: --- Github user mxm commented on a diff in the pull request: https://github.com/apache/flink/pull/2257#discussion_r71350758 --- Diff: flink-yarn/src/main/java/org/apache/flink/yarn/YarnFlinkResourceManager.java --- @@ -78,6 +79,9 @@ /** The containers where a TaskManager is starting and we are waiting for it to register */ private final Map containersInLaunch; + /** The container where a TaskManager has been started and is running in */ + private final Map containersLaunched; --- End diff -- >You tell me, since you've authored this component. I think there is a misunderstanding. I did author this component but it is not mine. I know you spend some time working on this and I just want to understand your motives. I thought actually you had tried this fix since you mentioned it in the JIRA. I think clearing the list is simply a bug and artifact of an old code design where the ResourceManager would be the central instance for TaskManager registration. > TaskManager registration exponential backoff doesn't work > - > > Key: FLINK-4152 > URL: https://issues.apache.org/jira/browse/FLINK-4152 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination, TaskManager, YARN Client >Reporter: Robert Metzger >Assignee: Till Rohrmann > Attachments: logs.tgz > > > While testing Flink 1.1 I've found that the TaskManagers are logging many > messages when registering at the JobManager. > This is the log file: > https://gist.github.com/rmetzger/0cebe0419cdef4507b1e8a42e33ef294 > Its logging more than 3000 messages in less than a minute. I don't think that > this is the expected behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-4183) Move checking for StreamTableEnvironment into validation layer
[ https://issues.apache.org/jira/browse/FLINK-4183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timo Walther resolved FLINK-4183. - Resolution: Fixed Fix Version/s: 1.1.0 Fixed in e85f787b280b63960e7f3add5aa8613b4ee23795. > Move checking for StreamTableEnvironment into validation layer > -- > > Key: FLINK-4183 > URL: https://issues.apache.org/jira/browse/FLINK-4183 > Project: Flink > Issue Type: Improvement > Components: Table API & SQL >Reporter: Timo Walther >Assignee: Timo Walther >Priority: Minor > Fix For: 1.1.0 > > > Some operators check the environment in `table.scala` instead of doing this > during the valdation phase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink issue #2263: [FLINK-4230] [DataStreamAPI] Add Session Windowing ITCase
Github user StefanRRichter commented on the issue: https://github.com/apache/flink/pull/2263 Thanks a lot for the review. I agree on all your points and will address them. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-4084) Add configDir parameter to CliFrontend and flink shell script
[ https://issues.apache.org/jira/browse/FLINK-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384268#comment-15384268 ] ASF GitHub Bot commented on FLINK-4084: --- Github user mxm commented on a diff in the pull request: https://github.com/apache/flink/pull/2149#discussion_r71353752 --- Diff: flink-clients/src/main/java/org/apache/flink/client/cli/CliFrontendParser.java --- @@ -451,4 +479,25 @@ public static InfoOptions parseInfoCommand(String[] args) throws CliArgsExceptio } } + public static MainOptions parseMainCommand(String[] args) throws CliArgsException { + + // drop all arguments after an action + final List params= Arrays.asList(args); --- End diff -- space missing > Add configDir parameter to CliFrontend and flink shell script > - > > Key: FLINK-4084 > URL: https://issues.apache.org/jira/browse/FLINK-4084 > Project: Flink > Issue Type: Improvement > Components: Client >Affects Versions: 1.1.0 >Reporter: Till Rohrmann >Assignee: Andrea Sella >Priority: Minor > > At the moment there is no other way than the environment variable > FLINK_CONF_DIR to specify the configuration directory for the CliFrontend if > it is started via the flink shell script. In order to improve the user > exprience, I would propose to introduce a {{--configDir}} parameter which the > user can use to specify a configuration directory more easily. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request #2149: [FLINK-4084] Add configDir parameter to CliFronten...
Github user mxm commented on a diff in the pull request: https://github.com/apache/flink/pull/2149#discussion_r71353752 --- Diff: flink-clients/src/main/java/org/apache/flink/client/cli/CliFrontendParser.java --- @@ -451,4 +479,25 @@ public static InfoOptions parseInfoCommand(String[] args) throws CliArgsExceptio } } + public static MainOptions parseMainCommand(String[] args) throws CliArgsException { + + // drop all arguments after an action + final List params= Arrays.asList(args); --- End diff -- space missing --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-4230) Session Windowing IT Case
[ https://issues.apache.org/jira/browse/FLINK-4230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384274#comment-15384274 ] ASF GitHub Bot commented on FLINK-4230: --- Github user aljoscha commented on the issue: https://github.com/apache/flink/pull/2263 Oh, and I forgot: the checking for the correct number of elements can be moved out of the window function and into the test itself, like this: ``` JobExecutionResult result = env.execute(); Assert.assertEquals( (LATE_EVENTS_PER_SESSION + 1) * NUMBER_OF_SESSIONS * EVENTS_PER_SESSION, result.getAccumulatorResult(SESSION_COUNTER_ON_TIME_KEY)); Assert.assertEquals( NUMBER_OF_SESSIONS * (LATE_EVENTS_PER_SESSION * (LATE_EVENTS_PER_SESSION + 1) / 2), result.getAccumulatorResult(SESSION_COUNTER_LATE_KEY)); ``` Also, you can let the test class extend `StreamingMultipleProgramsTestBase`. This will setup a testing cluster with parallelism 4. You can then use this inside your test: ``` StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment() ``` If you make the source non-parallel the window operator will then run with parallelism 4 and counting the number of elements after the job is done will accumulate the counts from all parallel instances. > Session Windowing IT Case > - > > Key: FLINK-4230 > URL: https://issues.apache.org/jira/browse/FLINK-4230 > Project: Flink > Issue Type: Test > Components: DataStream API, Local Runtime >Reporter: Stefan Richter >Assignee: Stefan Richter > > An ITCase for Session Windows is missing that tests correct behavior under > several parallel sessions, with timely events, late events within and after > the lateness interval. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink issue #2263: [FLINK-4230] [DataStreamAPI] Add Session Windowing ITCase
Github user aljoscha commented on the issue: https://github.com/apache/flink/pull/2263 Oh, and I forgot: the checking for the correct number of elements can be moved out of the window function and into the test itself, like this: ``` JobExecutionResult result = env.execute(); Assert.assertEquals( (LATE_EVENTS_PER_SESSION + 1) * NUMBER_OF_SESSIONS * EVENTS_PER_SESSION, result.getAccumulatorResult(SESSION_COUNTER_ON_TIME_KEY)); Assert.assertEquals( NUMBER_OF_SESSIONS * (LATE_EVENTS_PER_SESSION * (LATE_EVENTS_PER_SESSION + 1) / 2), result.getAccumulatorResult(SESSION_COUNTER_LATE_KEY)); ``` Also, you can let the test class extend `StreamingMultipleProgramsTestBase`. This will setup a testing cluster with parallelism 4. You can then use this inside your test: ``` StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment() ``` If you make the source non-parallel the window operator will then run with parallelism 4 and counting the number of elements after the job is done will accumulate the counts from all parallel instances. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request #2149: [FLINK-4084] Add configDir parameter to CliFronten...
Github user mxm commented on a diff in the pull request: https://github.com/apache/flink/pull/2149#discussion_r71354207 --- Diff: flink-clients/src/main/java/org/apache/flink/client/cli/CliFrontendParser.java --- @@ -451,4 +479,25 @@ public static InfoOptions parseInfoCommand(String[] args) throws CliArgsExceptio } } + public static MainOptions parseMainCommand(String[] args) throws CliArgsException { + + // drop all arguments after an action + final List params= Arrays.asList(args); + for (String action: CliFrontend.ACTIONS) { + int index = params.indexOf(action); + if(index != -1) { + args = Arrays.copyOfRange(args, 0, index); --- End diff -- I think dropping the args is not necessary if you use the parsing below. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-4230) Session Windowing IT Case
[ https://issues.apache.org/jira/browse/FLINK-4230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384252#comment-15384252 ] ASF GitHub Bot commented on FLINK-4230: --- Github user StefanRRichter commented on the issue: https://github.com/apache/flink/pull/2263 Thanks a lot for the review. I agree on all your points and will address them. > Session Windowing IT Case > - > > Key: FLINK-4230 > URL: https://issues.apache.org/jira/browse/FLINK-4230 > Project: Flink > Issue Type: Test > Components: DataStream API, Local Runtime >Reporter: Stefan Richter >Assignee: Stefan Richter > > An ITCase for Session Windows is missing that tests correct behavior under > several parallel sessions, with timely events, late events within and after > the lateness interval. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4084) Add configDir parameter to CliFrontend and flink shell script
[ https://issues.apache.org/jira/browse/FLINK-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384276#comment-15384276 ] ASF GitHub Bot commented on FLINK-4084: --- Github user mxm commented on a diff in the pull request: https://github.com/apache/flink/pull/2149#discussion_r71354207 --- Diff: flink-clients/src/main/java/org/apache/flink/client/cli/CliFrontendParser.java --- @@ -451,4 +479,25 @@ public static InfoOptions parseInfoCommand(String[] args) throws CliArgsExceptio } } + public static MainOptions parseMainCommand(String[] args) throws CliArgsException { + + // drop all arguments after an action + final List params= Arrays.asList(args); + for (String action: CliFrontend.ACTIONS) { + int index = params.indexOf(action); + if(index != -1) { + args = Arrays.copyOfRange(args, 0, index); --- End diff -- I think dropping the args is not necessary if you use the parsing below. > Add configDir parameter to CliFrontend and flink shell script > - > > Key: FLINK-4084 > URL: https://issues.apache.org/jira/browse/FLINK-4084 > Project: Flink > Issue Type: Improvement > Components: Client >Affects Versions: 1.1.0 >Reporter: Till Rohrmann >Assignee: Andrea Sella >Priority: Minor > > At the moment there is no other way than the environment variable > FLINK_CONF_DIR to specify the configuration directory for the CliFrontend if > it is started via the flink shell script. In order to improve the user > exprience, I would propose to introduce a {{--configDir}} parameter which the > user can use to specify a configuration directory more easily. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4084) Add configDir parameter to CliFrontend and flink shell script
[ https://issues.apache.org/jira/browse/FLINK-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384278#comment-15384278 ] ASF GitHub Bot commented on FLINK-4084: --- Github user mxm commented on a diff in the pull request: https://github.com/apache/flink/pull/2149#discussion_r71354329 --- Diff: flink-clients/src/main/java/org/apache/flink/client/cli/ProgramOptions.java --- @@ -148,4 +148,5 @@ public boolean getDetachedMode() { public String getSavepointPath() { return savepointPath; } + --- End diff -- The changes in this file are not necessary. Could you revert? > Add configDir parameter to CliFrontend and flink shell script > - > > Key: FLINK-4084 > URL: https://issues.apache.org/jira/browse/FLINK-4084 > Project: Flink > Issue Type: Improvement > Components: Client >Affects Versions: 1.1.0 >Reporter: Till Rohrmann >Assignee: Andrea Sella >Priority: Minor > > At the moment there is no other way than the environment variable > FLINK_CONF_DIR to specify the configuration directory for the CliFrontend if > it is started via the flink shell script. In order to improve the user > exprience, I would propose to introduce a {{--configDir}} parameter which the > user can use to specify a configuration directory more easily. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4084) Add configDir parameter to CliFrontend and flink shell script
[ https://issues.apache.org/jira/browse/FLINK-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384279#comment-15384279 ] ASF GitHub Bot commented on FLINK-4084: --- Github user mxm commented on a diff in the pull request: https://github.com/apache/flink/pull/2149#discussion_r71354413 --- Diff: flink-clients/src/test/java/org/apache/flink/client/CliFrontendMainTest.java --- @@ -0,0 +1,39 @@ +package org.apache.flink.client; + + +import org.apache.flink.client.cli.CliArgsException; +import org.apache.flink.client.cli.CliFrontendParser; +import org.apache.flink.client.cli.MainOptions; +import org.junit.BeforeClass; +import org.junit.Test; + +import java.io.FileNotFoundException; +import java.net.MalformedURLException; + +import static org.apache.flink.client.CliFrontendTestUtils.getTestJarPath; +import static org.junit.Assert.assertEquals; + +public class CliFrontendMainTest { --- End diff -- Maybe `CliFrontendMainArgsTest`? > Add configDir parameter to CliFrontend and flink shell script > - > > Key: FLINK-4084 > URL: https://issues.apache.org/jira/browse/FLINK-4084 > Project: Flink > Issue Type: Improvement > Components: Client >Affects Versions: 1.1.0 >Reporter: Till Rohrmann >Assignee: Andrea Sella >Priority: Minor > > At the moment there is no other way than the environment variable > FLINK_CONF_DIR to specify the configuration directory for the CliFrontend if > it is started via the flink shell script. In order to improve the user > exprience, I would propose to introduce a {{--configDir}} parameter which the > user can use to specify a configuration directory more easily. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4084) Add configDir parameter to CliFrontend and flink shell script
[ https://issues.apache.org/jira/browse/FLINK-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384283#comment-15384283 ] ASF GitHub Bot commented on FLINK-4084: --- Github user mxm commented on a diff in the pull request: https://github.com/apache/flink/pull/2149#discussion_r71354777 --- Diff: flink-dist/src/main/flink-bin/bin/flink --- @@ -17,21 +17,39 @@ # limitations under the License. --- End diff -- You change the mode from 644 to 755. > Add configDir parameter to CliFrontend and flink shell script > - > > Key: FLINK-4084 > URL: https://issues.apache.org/jira/browse/FLINK-4084 > Project: Flink > Issue Type: Improvement > Components: Client >Affects Versions: 1.1.0 >Reporter: Till Rohrmann >Assignee: Andrea Sella >Priority: Minor > > At the moment there is no other way than the environment variable > FLINK_CONF_DIR to specify the configuration directory for the CliFrontend if > it is started via the flink shell script. In order to improve the user > exprience, I would propose to introduce a {{--configDir}} parameter which the > user can use to specify a configuration directory more easily. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request #2149: [FLINK-4084] Add configDir parameter to CliFronten...
Github user mxm commented on a diff in the pull request: https://github.com/apache/flink/pull/2149#discussion_r71354132 --- Diff: flink-clients/src/main/java/org/apache/flink/client/cli/CliFrontendParser.java --- @@ -451,4 +479,25 @@ public static InfoOptions parseInfoCommand(String[] args) throws CliArgsExceptio } } + public static MainOptions parseMainCommand(String[] args) throws CliArgsException { + + // drop all arguments after an action + final List params= Arrays.asList(args); + for (String action: CliFrontend.ACTIONS) { + int index = params.indexOf(action); + if(index != -1) { + args = Arrays.copyOfRange(args, 0, index); + break; + } + } + + try { + DefaultParser parser = new DefaultParser(); + CommandLine line = parser.parse(MAIN_OPTIONS, args, false); --- End diff -- Actually you can use `CommandLine line = parser.parse(MAIN_OPTIONS, args, true);` to halt once you reach a non-arg. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-4190) Generalise RollingSink to work with arbitrary buckets
[ https://issues.apache.org/jira/browse/FLINK-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384249#comment-15384249 ] Aljoscha Krettek commented on FLINK-4190: - Did you open the PR with the correct FLINK-4190 tag? Normally, it should show up here. Could you post a link to the PR? > Generalise RollingSink to work with arbitrary buckets > - > > Key: FLINK-4190 > URL: https://issues.apache.org/jira/browse/FLINK-4190 > Project: Flink > Issue Type: Improvement > Components: filesystem-connector, Streaming Connectors >Reporter: Josh Forman-Gornall >Assignee: Josh Forman-Gornall >Priority: Minor > > The current RollingSink implementation appears to be intended for writing to > directories that are bucketed by system time (e.g. minutely) and to only be > writing to one file within one bucket at any point in time. When the system > time determines that the current bucket should be changed, the current bucket > and file are closed and a new bucket and file are created. The sink cannot be > used for the more general problem of writing to arbitrary buckets, perhaps > determined by an attribute on the element/tuple being processed. > There are three limitations which prevent the existing sink from being used > for more general problems: > - Only bucketing by the current system time is supported, and not by e.g. an > attribute of the element being processed by the sink. > - Whenever the sink sees a change in the bucket being written to, it flushes > the file and moves on to the new bucket. Therefore the sink cannot have more > than one bucket/file open at a time. Additionally the checkpointing mechanics > only support saving the state of one active bucket and file. > - The sink determines that it should 'close' an active bucket and file when > the bucket path changes. We need another way to determine when a bucket has > become inactive and needs to be closed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4084) Add configDir parameter to CliFrontend and flink shell script
[ https://issues.apache.org/jira/browse/FLINK-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384275#comment-15384275 ] ASF GitHub Bot commented on FLINK-4084: --- Github user mxm commented on a diff in the pull request: https://github.com/apache/flink/pull/2149#discussion_r71354132 --- Diff: flink-clients/src/main/java/org/apache/flink/client/cli/CliFrontendParser.java --- @@ -451,4 +479,25 @@ public static InfoOptions parseInfoCommand(String[] args) throws CliArgsExceptio } } + public static MainOptions parseMainCommand(String[] args) throws CliArgsException { + + // drop all arguments after an action + final List params= Arrays.asList(args); + for (String action: CliFrontend.ACTIONS) { + int index = params.indexOf(action); + if(index != -1) { + args = Arrays.copyOfRange(args, 0, index); + break; + } + } + + try { + DefaultParser parser = new DefaultParser(); + CommandLine line = parser.parse(MAIN_OPTIONS, args, false); --- End diff -- Actually you can use `CommandLine line = parser.parse(MAIN_OPTIONS, args, true);` to halt once you reach a non-arg. > Add configDir parameter to CliFrontend and flink shell script > - > > Key: FLINK-4084 > URL: https://issues.apache.org/jira/browse/FLINK-4084 > Project: Flink > Issue Type: Improvement > Components: Client >Affects Versions: 1.1.0 >Reporter: Till Rohrmann >Assignee: Andrea Sella >Priority: Minor > > At the moment there is no other way than the environment variable > FLINK_CONF_DIR to specify the configuration directory for the CliFrontend if > it is started via the flink shell script. In order to improve the user > exprience, I would propose to introduce a {{--configDir}} parameter which the > user can use to specify a configuration directory more easily. -- This message was sent by Atlassian JIRA (v6.3.4#6332)