date:20170325

[jira] [Commented] (FLINK-5698) Add NestedFieldsProjectableTableSource interface

2017-03-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-5698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15941615#comment-15941615
 ] 

ASF GitHub Bot commented on FLINK-5698:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/3269


> Add NestedFieldsProjectableTableSource interface
> 
>
> Key: FLINK-5698
> URL: https://issues.apache.org/jira/browse/FLINK-5698
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: Anton Solovev
>Assignee: Anton Solovev
>
> Add a NestedFieldsProjectableTableSource interface for some TableSource 
> implementation that support nesting projection push-down.
> The interface could look as follows
> {code}
> def trait NestedFieldsProjectableTableSource {
>   def projectNestedFields(fields: Array[String]): 
> NestedFieldsProjectableTableSource[T]
> }
> {code}
> This interface works together with ProjectableTableSource 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Closed] (FLINK-5698) Add NestedFieldsProjectableTableSource interface

2017-03-25 Thread Fabian Hueske (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-5698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fabian Hueske closed FLINK-5698.

   Resolution: Implemented
Fix Version/s: 1.3.0

Implemented with 5c37e55c83f854c1a9eb7bd7438b378b8c4b0a9f

> Add NestedFieldsProjectableTableSource interface
> 
>
> Key: FLINK-5698
> URL: https://issues.apache.org/jira/browse/FLINK-5698
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: Anton Solovev
>Assignee: Anton Solovev
> Fix For: 1.3.0
>
>
> Add a NestedFieldsProjectableTableSource interface for some TableSource 
> implementation that support nesting projection push-down.
> The interface could look as follows
> {code}
> def trait NestedFieldsProjectableTableSource {
>   def projectNestedFields(fields: Array[String]): 
> NestedFieldsProjectableTableSource[T]
> }
> {code}
> This interface works together with ProjectableTableSource 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] flink pull request #3269: [FLINK-5698] Add NestedFieldsProjectableTableSourc...

2017-03-25 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/3269


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #2460: [FLINK-4562] table examples make an divided module in fli...

2017-03-25 Thread shijinkui

Github user shijinkui commented on the issue:

https://github.com/apache/flink/pull/2460
  
> I agree with @wuchong , that we should follow the example of Gelly and 
add the flink-table-examples JAR file to the opt folder.

@fhueske Thanks for your review.

IMO, directory style below is reasonable:
* `examples` directory only contain example jars
* `opt` directory only contain  optional library jars
* `lib` directory only contain library jar that must be load in runtime

The `opt` directory is noisy, that contains lib jar and example jar: 

``` 
flink-cep-scala_2.11-1.3.0.jar 
flink-gelly_2.11-1.3.0.jar  
flink-metrics-statsd-1.3.0.jar
flink-cep_2.11-1.3.0.jar   
flink-metrics-dropwizard-1.3.0.jar  
flink-ml_2.11-1.3.0.jar
flink-gelly-examples_2.11-1.3.0.jar 
flink-metrics-ganglia-1.3.0.jar
flink-gelly-scala_2.11-1.3.0.jar
flink-metrics-graphite-1.3.0.jar
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-4562) table examples make an divided module in flink-examples

2017-03-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-4562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15941619#comment-15941619
 ] 

ASF GitHub Bot commented on FLINK-4562:
---

Github user shijinkui commented on the issue:

https://github.com/apache/flink/pull/2460
  
> I agree with @wuchong , that we should follow the example of Gelly and 
add the flink-table-examples JAR file to the opt folder.

@fhueske Thanks for your review.

IMO, directory style below is reasonable:
* `examples` directory only contain example jars
* `opt` directory only contain  optional library jars
* `lib` directory only contain library jar that must be load in runtime

The `opt` directory is noisy, that contains lib jar and example jar: 

``` 
flink-cep-scala_2.11-1.3.0.jar 
flink-gelly_2.11-1.3.0.jar  
flink-metrics-statsd-1.3.0.jar
flink-cep_2.11-1.3.0.jar   
flink-metrics-dropwizard-1.3.0.jar  
flink-ml_2.11-1.3.0.jar
flink-gelly-examples_2.11-1.3.0.jar 
flink-metrics-ganglia-1.3.0.jar
flink-gelly-scala_2.11-1.3.0.jar
flink-metrics-graphite-1.3.0.jar
```


> table examples make an divided module in flink-examples
> ---
>
> Key: FLINK-4562
> URL: https://issues.apache.org/jira/browse/FLINK-4562
> Project: Flink
>  Issue Type: Improvement
>  Components: Examples, Table API & SQL
>Reporter: shijinkui
>Assignee: shijinkui
> Fix For: 1.2.1
>
>
> example code should't packaged in table module.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] flink pull request #3386: [FLINK-5658][table] support unbounded eventtime ov...

2017-03-25 Thread fhueske

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3386#discussion_r108030107
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/nodes/datastream/DataStreamOverAggregate.scala
 ---
@@ -112,7 +113,12 @@ class DataStreamOverAggregate(
   "condition.")
 }
   case _: RowTimeType =>
-throw new TableException("OVER Window of the EventTime type is not 
currently supported.")
+if (overWindow.lowerBound.isUnbounded && 
overWindow.upperBound.isCurrentRow) {
+  createUnboundedAndCurrentRowEventTimeOverWindow(inputDS)
--- End diff --

Yes, renaming the JIRA and a new for the `RANGE` case would be great! Thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-6186) Remove unused import

2017-03-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15941627#comment-15941627
 ] 

ASF GitHub Bot commented on FLINK-6186:
---

GitHub user zhengcanbin opened a pull request:

https://github.com/apache/flink/pull/3612

[FLINK-6186][cleanup]Remove unused import.

Remove unused import in 
org.apache.flink.streaming.api.scala.StreamExecutionEnvironment.scala

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zhengcanbin/flink flink-6186

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/3612.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3612


commit d80b3f18b3a57d0637a99c6791b9d6c98267eb11
Author: zcb 
Date:   2017-03-25T07:42:03Z

Remove unused import in StreamExecutionEnvironment.scala




> Remove unused import
> 
>
> Key: FLINK-6186
> URL: https://issues.apache.org/jira/browse/FLINK-6186
> Project: Flink
>  Issue Type: Wish
>Reporter: CanBin Zheng
>Assignee: CanBin Zheng
>Priority: Trivial
>
> Remove unused import org.apache.flink.api.java.ExecutionEnvironment in 
> org.apache.flink.streaming.api.scala.StreamExecutionEnvironment.scala



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] flink pull request #3612: [FLINK-6186][cleanup]Remove unused import.

2017-03-25 Thread zhengcanbin

GitHub user zhengcanbin opened a pull request:

https://github.com/apache/flink/pull/3612

[FLINK-6186][cleanup]Remove unused import.

Remove unused import in 
org.apache.flink.streaming.api.scala.StreamExecutionEnvironment.scala

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zhengcanbin/flink flink-6186

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/3612.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3612


commit d80b3f18b3a57d0637a99c6791b9d6c98267eb11
Author: zcb 
Date:   2017-03-25T07:42:03Z

Remove unused import in StreamExecutionEnvironment.scala




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-6159) Add Java/Scala FlinkLauncher

2017-03-25 Thread CanBin Zheng (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15941633#comment-15941633
 ] 

CanBin Zheng commented on FLINK-6159:
-

[~StephanEwen]

I'd like to have a try, it deserves.

> Add Java/Scala FlinkLauncher
> 
>
> Key: FLINK-6159
> URL: https://issues.apache.org/jira/browse/FLINK-6159
> Project: Flink
>  Issue Type: Improvement
>  Components: Client
>Reporter: CanBin Zheng
>  Labels: features
>
> Now we can use flink.sh or yarn-session.sh to submit Jobs/Applications. I 
> think it's quite convenient and helpful to have pure java/scala FlinkLauncher 
> for users to submit jobs/applications without any dependency of these shell 
> scripts. We can do this just like MapReduce used to.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (FLINK-5829) Bump Calcite version to 1.12 once available

2017-03-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-5829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15941636#comment-15941636
 ] 

ASF GitHub Bot commented on FLINK-5829:
---

GitHub user haohui opened a pull request:

https://github.com/apache/flink/pull/3613

[FLINK-5829] Bump Calcite version to 1.12 once available.

This PR bumps the Calcite version from 1.11 to 1.12.

The main issue is that it conflicts with FLINK-4288, as the `tableMap` 
field becomes protected which makes unregistering table no longer trivial.

As the first iteration I disable the related tests in this PR. We need to 
come up with a solution.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/haohui/flink FLINK-5829

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/3613.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3613


commit 02533b4dbc2ba66f9ff3ce2f68c415d1460a6258
Author: Haohui Mai 
Date:   2017-03-25T08:00:50Z

[FLINK-5829] Bump Calcite version to 1.12 once available.




> Bump Calcite version to 1.12 once available
> ---
>
> Key: FLINK-5829
> URL: https://issues.apache.org/jira/browse/FLINK-5829
> Project: Flink
>  Issue Type: Task
>  Components: Table API & SQL
>Reporter: Fabian Hueske
>Assignee: Haohui Mai
>
> Once Calcite 1.12 is release we should update to remove some copied classes. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] flink pull request #3613: [FLINK-5829] Bump Calcite version to 1.12 once ava...

2017-03-25 Thread haohui

GitHub user haohui opened a pull request:

https://github.com/apache/flink/pull/3613

[FLINK-5829] Bump Calcite version to 1.12 once available.

This PR bumps the Calcite version from 1.11 to 1.12.

The main issue is that it conflicts with FLINK-4288, as the `tableMap` 
field becomes protected which makes unregistering table no longer trivial.

As the first iteration I disable the related tests in this PR. We need to 
come up with a solution.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/haohui/flink FLINK-5829

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/3613.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3613


commit 02533b4dbc2ba66f9ff3ce2f68c415d1460a6258
Author: Haohui Mai 
Date:   2017-03-25T08:00:50Z

[FLINK-5829] Bump Calcite version to 1.12 once available.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-5829) Bump Calcite version to 1.12 once available

2017-03-25 Thread Haohui Mai (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-5829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15941637#comment-15941637
 ] 

Haohui Mai commented on FLINK-5829:
---

I just pushed a PR. The migration process is relatively straightforward except 
that Calcite 1.12 seems to be conflicted with FLINK-4288.

The {{tableMap}} field has become a protected member thus unregister table 
become non-trivial. There are two options here.

1. Revert FLINK-4288. FLINK-4288 has not been released yet thus it is okay to 
pull it back with no concerns on backward compatibility.
2. Implement a proxy schema which inherits from {{CalciteSchema}} to regain the 
access of the field.

[~fhueske] [~twalthr], what do you think?

> Bump Calcite version to 1.12 once available
> ---
>
> Key: FLINK-5829
> URL: https://issues.apache.org/jira/browse/FLINK-5829
> Project: Flink
>  Issue Type: Task
>  Components: Table API & SQL
>Reporter: Fabian Hueske
>Assignee: Haohui Mai
>
> Once Calcite 1.12 is release we should update to remove some copied classes. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (FLINK-6188) Some setParallelism() method can't cope with default parallelism

2017-03-25 Thread Aljoscha Krettek (JIRA)

Aljoscha Krettek created FLINK-6188:
---

 Summary: Some setParallelism() method can't cope with default 
parallelism
 Key: FLINK-6188
 URL: https://issues.apache.org/jira/browse/FLINK-6188
 Project: Flink
  Issue Type: Bug
  Components: DataStream API
Affects Versions: 1.2.1
Reporter: Aljoscha Krettek
 Fix For: 1.2.1


Recent changes done for FLINK-5808 move default parallelism manifestation from 
eager to lazy, that is, the parallelism of operations that don't have an 
explicit parallelism is only set when generating the JobGraph. Some 
`setParallelism()` calls, such as `SingleOutputStreamOperator.setParallelism()` 
cannot deal with the fact that the parallelism of an operation might be {{-1}} 
(which indicates that it should take the default parallelism when generating 
the JobGraph).

We should either revert the changes that fixed another user-facing bug for 
version 1.2.1 or fix the methods.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (FLINK-6188) Some setParallelism() method can't cope with default parallelism

2017-03-25 Thread Aljoscha Krettek (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-6188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated FLINK-6188:

Priority: Blocker  (was: Major)

> Some setParallelism() method can't cope with default parallelism
> 
>
> Key: FLINK-6188
> URL: https://issues.apache.org/jira/browse/FLINK-6188
> Project: Flink
>  Issue Type: Bug
>  Components: DataStream API
>Affects Versions: 1.2.1
>Reporter: Aljoscha Krettek
>Priority: Blocker
> Fix For: 1.2.1
>
>
> Recent changes done for FLINK-5808 move default parallelism manifestation 
> from eager to lazy, that is, the parallelism of operations that don't have an 
> explicit parallelism is only set when generating the JobGraph. Some 
> `setParallelism()` calls, such as 
> `SingleOutputStreamOperator.setParallelism()` cannot deal with the fact that 
> the parallelism of an operation might be {{-1}} (which indicates that it 
> should take the default parallelism when generating the JobGraph).
> We should either revert the changes that fixed another user-facing bug for 
> version 1.2.1 or fix the methods.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (FLINK-6148) The Zookeeper client occur SASL error when the sasl is disable

2017-03-25 Thread zhangrucong1982 (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15941659#comment-15941659
 ] 

zhangrucong1982 commented on FLINK-6148:


ok，I will close the issue。

> The Zookeeper client occur SASL error when the sasl is disable
> --
>
> Key: FLINK-6148
> URL: https://issues.apache.org/jira/browse/FLINK-6148
> Project: Flink
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 1.2.0
>Reporter: zhangrucong1982
>
> I use the flink in yarn cluster of version 1.2.0.  The HA is configured in 
> flink-conf.yaml, but the sasl is disabled. The configurations are :
> high-availability: zookeeper
> high-availability.zookeeper.quorum: 
> 100.106.40.102:2181,100.106.57.136:2181,100.106.41.233:2181
> high-availability.zookeeper.storageDir: hdfs:/flink
> high-availability.zookeeper.client.acl: open
> high-availability.zookeeper.path.root:  flink0308
> zookeeper.sasl.disable: true
> The client log、JobManager log、TaskManager log are contain the following error 
> information:
> 2017-03-22 11:18:24,662 WARN  org.apache.zookeeper.ClientCnxn 
>   - SASL configuration failed: 
> javax.security.auth.login.LoginException: No JAAS configuration section named 
> 'Client' was found in specified JAAS configuration file: 
> '/tmp/jaas-441937039502263015.conf'. Will continue connection to Zookeeper 
> server without SASL authentication, if Zookeeper server allows it.
> 2017-03-22 11:18:24,663 ERROR 
> org.apache.flink.shaded.org.apache.curator.ConnectionState- 
> Authentication failed



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Closed] (FLINK-6148) The Zookeeper client occur SASL error when the sasl is disable

2017-03-25 Thread zhangrucong1982 (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-6148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangrucong1982 closed FLINK-6148.
--
Resolution: Done

this issue is the same with the issue 6117. And the issue 6117 is resolved by 
canbin zheng.

> The Zookeeper client occur SASL error when the sasl is disable
> --
>
> Key: FLINK-6148
> URL: https://issues.apache.org/jira/browse/FLINK-6148
> Project: Flink
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 1.2.0
>Reporter: zhangrucong1982
>
> I use the flink in yarn cluster of version 1.2.0.  The HA is configured in 
> flink-conf.yaml, but the sasl is disabled. The configurations are :
> high-availability: zookeeper
> high-availability.zookeeper.quorum: 
> 100.106.40.102:2181,100.106.57.136:2181,100.106.41.233:2181
> high-availability.zookeeper.storageDir: hdfs:/flink
> high-availability.zookeeper.client.acl: open
> high-availability.zookeeper.path.root:  flink0308
> zookeeper.sasl.disable: true
> The client log、JobManager log、TaskManager log are contain the following error 
> information:
> 2017-03-22 11:18:24,662 WARN  org.apache.zookeeper.ClientCnxn 
>   - SASL configuration failed: 
> javax.security.auth.login.LoginException: No JAAS configuration section named 
> 'Client' was found in specified JAAS configuration file: 
> '/tmp/jaas-441937039502263015.conf'. Will continue connection to Zookeeper 
> server without SASL authentication, if Zookeeper server allows it.
> 2017-03-22 11:18:24,663 ERROR 
> org.apache.flink.shaded.org.apache.curator.ConnectionState- 
> Authentication failed



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (FLINK-6189) Do not use yarn client config to do sanity check

2017-03-25 Thread Tao Wang (JIRA)

Tao Wang created FLINK-6189:
---

 Summary: Do not use yarn client config to do sanity check
 Key: FLINK-6189
 URL: https://issues.apache.org/jira/browse/FLINK-6189
 Project: Flink
  Issue Type: Bug
  Components: YARN
Reporter: Tao Wang


Now in client, if #slots is greater than then number of 
"yarn.nodemanager.resource.cpu-vcores" in yarn client config, the submission 
will be rejected.

It makes no sense as the actual vcores of node manager is decided in cluster 
side, but not in client side. If we don't set the config or don't set the right 
value of it(indeed this config is not a mandatory), it should not affect flink 
submission.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Assigned] (FLINK-6189) Do not use yarn client config to do sanity check

2017-03-25 Thread Tao Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-6189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Wang reassigned FLINK-6189:
---

Assignee: Tao Wang

> Do not use yarn client config to do sanity check
> 
>
> Key: FLINK-6189
> URL: https://issues.apache.org/jira/browse/FLINK-6189
> Project: Flink
>  Issue Type: Bug
>  Components: YARN
>Reporter: Tao Wang
>Assignee: Tao Wang
>
> Now in client, if #slots is greater than then number of 
> "yarn.nodemanager.resource.cpu-vcores" in yarn client config, the submission 
> will be rejected.
> It makes no sense as the actual vcores of node manager is decided in cluster 
> side, but not in client side. If we don't set the config or don't set the 
> right value of it(indeed this config is not a mandatory), it should not 
> affect flink submission.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] flink pull request #3614: [FLINK-6189][YARN]Do not use yarn client config to...

2017-03-25 Thread WangTaoTheTonic

GitHub user WangTaoTheTonic opened a pull request:

https://github.com/apache/flink/pull/3614

[FLINK-6189][YARN]Do not use yarn client config to do sanity check

Now in client, if #slots is greater than then number of 
"yarn.nodemanager.resource.cpu-vcores" in yarn client config, the submission 
will be rejected.

It makes no sense as the actual vcores of node manager is decided in 
cluster side, but not in client side. If we don't set the config or don't set 
the right value of it(indeed this config is not a mandatory), it should not 
affect flink submission.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/WangTaoTheTonic/flink FLINK-6189

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/3614.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3614


commit eef3bba405557c6f7c55aee6983bb1bd9ade7501
Author: WangTaoTheTonic 
Date:   2017-03-25T10:00:35Z

Do not use yarn client config to do sanity check




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-6189) Do not use yarn client config to do sanity check

2017-03-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15941679#comment-15941679
 ] 

ASF GitHub Bot commented on FLINK-6189:
---

GitHub user WangTaoTheTonic opened a pull request:

https://github.com/apache/flink/pull/3614

[FLINK-6189][YARN]Do not use yarn client config to do sanity check

Now in client, if #slots is greater than then number of 
"yarn.nodemanager.resource.cpu-vcores" in yarn client config, the submission 
will be rejected.

It makes no sense as the actual vcores of node manager is decided in 
cluster side, but not in client side. If we don't set the config or don't set 
the right value of it(indeed this config is not a mandatory), it should not 
affect flink submission.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/WangTaoTheTonic/flink FLINK-6189

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/3614.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3614


commit eef3bba405557c6f7c55aee6983bb1bd9ade7501
Author: WangTaoTheTonic 
Date:   2017-03-25T10:00:35Z

Do not use yarn client config to do sanity check




> Do not use yarn client config to do sanity check
> 
>
> Key: FLINK-6189
> URL: https://issues.apache.org/jira/browse/FLINK-6189
> Project: Flink
>  Issue Type: Bug
>  Components: YARN
>Reporter: Tao Wang
>Assignee: Tao Wang
>
> Now in client, if #slots is greater than then number of 
> "yarn.nodemanager.resource.cpu-vcores" in yarn client config, the submission 
> will be rejected.
> It makes no sense as the actual vcores of node manager is decided in cluster 
> side, but not in client side. If we don't set the config or don't set the 
> right value of it(indeed this config is not a mandatory), it should not 
> affect flink submission.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (FLINK-6187) Cancel job failed with option -m on yarn session

2017-03-25 Thread Robert Metzger (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-6187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-6187:
--
Component/s: YARN

> Cancel job failed with option -m on yarn session
> 
>
> Key: FLINK-6187
> URL: https://issues.apache.org/jira/browse/FLINK-6187
> Project: Flink
>  Issue Type: Bug
>  Components: Client, YARN
>Reporter: Yuhong Hong
>
> 1. start yarn session: ./bin/yarn-session.sh -n 3 -jm 2048 -tm 3096
> 2. submit a job:  ./bin/flink run ...
> 3. cancel the job with option -m:
> ./bin/flink cancel -m ip:port jobid
> {code}
> org.apache.flink.configuration.IllegalConfigurationException: Couldn't 
> retrieve client for cluster
> at 
> org.apache.flink.client.CliFrontend.retrieveClient(CliFrontend.java:912)
> at 
> org.apache.flink.client.CliFrontend.getJobManagerGateway(CliFrontend.java:926)
> at org.apache.flink.client.CliFrontend.cancel(CliFrontend.java:602)
> at 
> org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1079)
> at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1120)
> at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1117)
> at 
> org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurityContext.java:43)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1738)
> at 
> org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:40)
> at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1116)
> Caused by: java.lang.RuntimeException: Failed to retrieve JobManager address
> at 
> org.apache.flink.client.program.ClusterClient.getJobManagerAddress(ClusterClient.java:248)
> at 
> org.apache.flink.client.CliFrontend.retrieveClient(CliFrontend.java:908)
> ... 11 more
> Caused by: org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: 
> Could not retrieve the leader address and leader session ID.
> at 
> org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderConnectionInfo(LeaderRetrievalUtils.java:175)
> at 
> org.apache.flink.client.program.ClusterClient.getJobManagerAddress(ClusterClient.java:242)
> ... 12 more
> Caused by: java.util.concurrent.TimeoutException: Futures timed out after 
> [6 milliseconds]
> at 
> scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
> at 
> scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
> at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190)
> at 
> scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
> at scala.concurrent.Await$.result(package.scala:190)
> at scala.concurrent.Await.result(package.scala)
> at 
> org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderConnectionInfo(LeaderRetrievalUtils.java:173)
> ... 13 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (FLINK-6187) Cancel job failed with option -m on yarn session

2017-03-25 Thread Robert Metzger (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-6187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-6187:
--
Component/s: Client

> Cancel job failed with option -m on yarn session
> 
>
> Key: FLINK-6187
> URL: https://issues.apache.org/jira/browse/FLINK-6187
> Project: Flink
>  Issue Type: Bug
>  Components: Client, YARN
>Reporter: Yuhong Hong
>
> 1. start yarn session: ./bin/yarn-session.sh -n 3 -jm 2048 -tm 3096
> 2. submit a job:  ./bin/flink run ...
> 3. cancel the job with option -m:
> ./bin/flink cancel -m ip:port jobid
> {code}
> org.apache.flink.configuration.IllegalConfigurationException: Couldn't 
> retrieve client for cluster
> at 
> org.apache.flink.client.CliFrontend.retrieveClient(CliFrontend.java:912)
> at 
> org.apache.flink.client.CliFrontend.getJobManagerGateway(CliFrontend.java:926)
> at org.apache.flink.client.CliFrontend.cancel(CliFrontend.java:602)
> at 
> org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1079)
> at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1120)
> at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1117)
> at 
> org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurityContext.java:43)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1738)
> at 
> org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:40)
> at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1116)
> Caused by: java.lang.RuntimeException: Failed to retrieve JobManager address
> at 
> org.apache.flink.client.program.ClusterClient.getJobManagerAddress(ClusterClient.java:248)
> at 
> org.apache.flink.client.CliFrontend.retrieveClient(CliFrontend.java:908)
> ... 11 more
> Caused by: org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: 
> Could not retrieve the leader address and leader session ID.
> at 
> org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderConnectionInfo(LeaderRetrievalUtils.java:175)
> at 
> org.apache.flink.client.program.ClusterClient.getJobManagerAddress(ClusterClient.java:242)
> ... 12 more
> Caused by: java.util.concurrent.TimeoutException: Futures timed out after 
> [6 milliseconds]
> at 
> scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
> at 
> scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
> at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190)
> at 
> scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
> at scala.concurrent.Await$.result(package.scala:190)
> at scala.concurrent.Await.result(package.scala)
> at 
> org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderConnectionInfo(LeaderRetrievalUtils.java:173)
> ... 13 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (FLINK-6186) Remove unused import

2017-03-25 Thread Robert Metzger (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-6186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-6186:
--
Component/s: Streaming
 Scala API

> Remove unused import
> 
>
> Key: FLINK-6186
> URL: https://issues.apache.org/jira/browse/FLINK-6186
> Project: Flink
>  Issue Type: Wish
>  Components: DataStream API, Scala API, Streaming
>Reporter: CanBin Zheng
>Assignee: CanBin Zheng
>Priority: Trivial
>
> Remove unused import org.apache.flink.api.java.ExecutionEnvironment in 
> org.apache.flink.streaming.api.scala.StreamExecutionEnvironment.scala



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (FLINK-6186) Remove unused import

2017-03-25 Thread Robert Metzger (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-6186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-6186:
--
Component/s: DataStream API

> Remove unused import
> 
>
> Key: FLINK-6186
> URL: https://issues.apache.org/jira/browse/FLINK-6186
> Project: Flink
>  Issue Type: Wish
>  Components: DataStream API, Scala API, Streaming
>Reporter: CanBin Zheng
>Assignee: CanBin Zheng
>Priority: Trivial
>
> Remove unused import org.apache.flink.api.java.ExecutionEnvironment in 
> org.apache.flink.streaming.api.scala.StreamExecutionEnvironment.scala



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (FLINK-6187) Cancel job failed with option -m on yarn session

2017-03-25 Thread Robert Metzger (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15941687#comment-15941687
 ] 

Robert Metzger commented on FLINK-6187:
---

Did you configure HA?

Are you sure the {{ip:port}} you've passed were correct?
I don't think this a bug in Flink. Its just that the command line interface can 
not connect to the JobManager for some reason.

> Cancel job failed with option -m on yarn session
> 
>
> Key: FLINK-6187
> URL: https://issues.apache.org/jira/browse/FLINK-6187
> Project: Flink
>  Issue Type: Bug
>  Components: Client, YARN
>Reporter: Yuhong Hong
>
> 1. start yarn session: ./bin/yarn-session.sh -n 3 -jm 2048 -tm 3096
> 2. submit a job:  ./bin/flink run ...
> 3. cancel the job with option -m:
> ./bin/flink cancel -m ip:port jobid
> {code}
> org.apache.flink.configuration.IllegalConfigurationException: Couldn't 
> retrieve client for cluster
> at 
> org.apache.flink.client.CliFrontend.retrieveClient(CliFrontend.java:912)
> at 
> org.apache.flink.client.CliFrontend.getJobManagerGateway(CliFrontend.java:926)
> at org.apache.flink.client.CliFrontend.cancel(CliFrontend.java:602)
> at 
> org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1079)
> at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1120)
> at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1117)
> at 
> org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurityContext.java:43)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1738)
> at 
> org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:40)
> at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1116)
> Caused by: java.lang.RuntimeException: Failed to retrieve JobManager address
> at 
> org.apache.flink.client.program.ClusterClient.getJobManagerAddress(ClusterClient.java:248)
> at 
> org.apache.flink.client.CliFrontend.retrieveClient(CliFrontend.java:908)
> ... 11 more
> Caused by: org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: 
> Could not retrieve the leader address and leader session ID.
> at 
> org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderConnectionInfo(LeaderRetrievalUtils.java:175)
> at 
> org.apache.flink.client.program.ClusterClient.getJobManagerAddress(ClusterClient.java:242)
> ... 12 more
> Caused by: java.util.concurrent.TimeoutException: Futures timed out after 
> [6 milliseconds]
> at 
> scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
> at 
> scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
> at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190)
> at 
> scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
> at scala.concurrent.Await$.result(package.scala:190)
> at scala.concurrent.Await.result(package.scala)
> at 
> org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderConnectionInfo(LeaderRetrievalUtils.java:173)
> ... 13 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] flink issue #3612: [FLINK-6186][cleanup]Remove unused import.

2017-03-25 Thread rmetzger

Github user rmetzger commented on the issue:

https://github.com/apache/flink/pull/3612
  
+1 to merge


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-6186) Remove unused import

2017-03-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15941689#comment-15941689
 ] 

ASF GitHub Bot commented on FLINK-6186:
---

Github user rmetzger commented on the issue:

https://github.com/apache/flink/pull/3612
  
+1 to merge


> Remove unused import
> 
>
> Key: FLINK-6186
> URL: https://issues.apache.org/jira/browse/FLINK-6186
> Project: Flink
>  Issue Type: Wish
>  Components: DataStream API, Scala API, Streaming
>Reporter: CanBin Zheng
>Assignee: CanBin Zheng
>Priority: Trivial
>
> Remove unused import org.apache.flink.api.java.ExecutionEnvironment in 
> org.apache.flink.streaming.api.scala.StreamExecutionEnvironment.scala



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (FLINK-5998) Un-fat Hadoop from Flink fat jar

2017-03-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-5998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15941690#comment-15941690
 ] 

ASF GitHub Bot commented on FLINK-5998:
---

Github user rmetzger commented on the issue:

https://github.com/apache/flink/pull/3604
  
Yes, the idea is that users in some environments can even delete 
`flink-dist-hadoop.jar` entirely, and just configure the classpath correctly to 
point to the Hadoop lib folder of their hadoop distribution.


> Un-fat Hadoop from Flink fat jar
> 
>
> Key: FLINK-5998
> URL: https://issues.apache.org/jira/browse/FLINK-5998
> Project: Flink
>  Issue Type: Improvement
>  Components: Build System
>Reporter: Robert Metzger
>Assignee: Haohui Mai
>
> As a first step towards FLINK-2268, I would suggest to put all hadoop 
> dependencies into a jar separate from Flink's fat jar.
> This would allow users to put a custom Hadoop jar in there, or even deploy 
> Flink without a Hadoop fat jar at all in environments where Hadoop is 
> provided (EMR).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] flink issue #3604: [FLINK-5998] Un-fat Hadoop from Flink fat jar

2017-03-25 Thread rmetzger

Github user rmetzger commented on the issue:

https://github.com/apache/flink/pull/3604
  
Yes, the idea is that users in some environments can even delete 
`flink-dist-hadoop.jar` entirely, and just configure the classpath correctly to 
point to the Hadoop lib folder of their hadoop distribution.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #1668: [FLINK-3257] Add Exactly-Once Processing Guarantee...

2017-03-25 Thread StephanEwen

Github user StephanEwen commented on a diff in the pull request:

https://github.com/apache/flink/pull/1668#discussion_r108034229
  
--- Diff: 
flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/StreamIterationTail.java
 ---
@@ -64,6 +70,22 @@ public void init() throws Exception {
super.init();
}
 
+   @Override
+   protected boolean performCheckpoint(CheckpointMetaData 
checkpointMetaData, CheckpointOptions checkpointOptions, CheckpointMetrics 
checkpointMetrics) throws Exception {
+   LOG.debug("Starting checkpoint {} on task {}", 
checkpointMetaData.getCheckpointId(), getName());
+
+   synchronized (getCheckpointLock()) {
+   if (isRunning()) {
+   dataChannel.put(new Either.Right(new 
CheckpointBarrier(checkpointMetaData.getCheckpointId(), 
checkpointMetaData.getTimestamp(), checkpointOptions)));
+   
getEnvironment().acknowledgeCheckpoint(checkpointMetaData.getCheckpointId(), 
checkpointMetrics);
--- End diff --

Can the `IterationTailTask` contain operators as well, or is it always a 
task without operators? If it has operators, we cannot immediately acknowledge 
here, but need to delegate to superclass checkpoint method instead.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-3257) Add Exactly-Once Processing Guarantees in Iterative DataStream Jobs

2017-03-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-3257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15941700#comment-15941700
 ] 

ASF GitHub Bot commented on FLINK-3257:
---

Github user StephanEwen commented on a diff in the pull request:

https://github.com/apache/flink/pull/1668#discussion_r108034229
  
--- Diff: 
flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/StreamIterationTail.java
 ---
@@ -64,6 +70,22 @@ public void init() throws Exception {
super.init();
}
 
+   @Override
+   protected boolean performCheckpoint(CheckpointMetaData 
checkpointMetaData, CheckpointOptions checkpointOptions, CheckpointMetrics 
checkpointMetrics) throws Exception {
+   LOG.debug("Starting checkpoint {} on task {}", 
checkpointMetaData.getCheckpointId(), getName());
+
+   synchronized (getCheckpointLock()) {
+   if (isRunning()) {
+   dataChannel.put(new Either.Right(new 
CheckpointBarrier(checkpointMetaData.getCheckpointId(), 
checkpointMetaData.getTimestamp(), checkpointOptions)));
+   
getEnvironment().acknowledgeCheckpoint(checkpointMetaData.getCheckpointId(), 
checkpointMetrics);
--- End diff --

Can the `IterationTailTask` contain operators as well, or is it always a 
task without operators? If it has operators, we cannot immediately acknowledge 
here, but need to delegate to superclass checkpoint method instead.


> Add Exactly-Once Processing Guarantees in Iterative DataStream Jobs
> ---
>
> Key: FLINK-3257
> URL: https://issues.apache.org/jira/browse/FLINK-3257
> Project: Flink
>  Issue Type: Improvement
>  Components: DataStream API
>Reporter: Paris Carbone
>Assignee: Paris Carbone
>
> The current snapshotting algorithm cannot support cycles in the execution 
> graph. An alternative scheme can potentially include records in-transit 
> through the back-edges of a cyclic execution graph (ABS [1]) to achieve the 
> same guarantees.
> One straightforward implementation of ABS for cyclic graphs can work as 
> follows along the lines:
> 1) Upon triggering a barrier in an IterationHead from the TaskManager start 
> block output and start upstream backup of all records forwarded from the 
> respective IterationSink.
> 2) The IterationSink should eventually forward the current snapshotting epoch 
> barrier to the IterationSource.
> 3) Upon receiving a barrier from the IterationSink, the IterationSource 
> should finalize the snapshot, unblock its output and emit all records 
> in-transit in FIFO order and continue the usual execution.
> --
> Upon restart the IterationSource should emit all records from the injected 
> snapshot first and then continue its usual execution.
> Several optimisations and slight variations can be potentially achieved but 
> this can be the initial implementation take.
> [1] http://arxiv.org/abs/1506.08603



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (FLINK-6107) Add custom checkstyle for flink-streaming-java

2017-03-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15941718#comment-15941718
 ] 

ASF GitHub Bot commented on FLINK-6107:
---

Github user greghogan commented on a diff in the pull request:

https://github.com/apache/flink/pull/3567#discussion_r108035621
  
--- Diff: tools/maven/strict-checkstyle.xml ---
@@ -0,0 +1,550 @@
+
+
+http://www.puppycrawl.com/dtds/configuration_1_3.dtd";>
+
+
+
+
+
+  
+
+  
+
+  
+
+
+  
+
+  
+
+
+
+
+  
+
+  
+
+
+
+  
+
+  
+
+
+
+  
+
+  
+  
+
+  
+
+  
+  
+
+
+
+  
+  
+
+  
+
+  
+  
+
+  
+
+  
+
+  
+
+  
+  
+
+
+
+
+  
+  
+  
+
+
+
+
+  
+  
+  
--- End diff --

The section comment is ``.


> Add custom checkstyle for flink-streaming-java
> --
>
> Key: FLINK-6107
> URL: https://issues.apache.org/jira/browse/FLINK-6107
> Project: Flink
>  Issue Type: Improvement
>  Components: DataStream API
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>
> There was some consensus on the ML 
> (https://lists.apache.org/thread.html/94c8c5186b315c58c3f8aaf536501b99e8b92ee97b0034dee295ff6a@%3Cdev.flink.apache.org%3E)
>  that we want to have a more uniform code style. We should start 
> module-by-module and by introducing increasingly stricter rules. We have to 
> be aware of the PR situation and ensure that we have minimal breakage for 
> contributors.
> This issue aims at adding a custom checkstyle.xml for 
> {{flink-streaming-java}} that is based on our current checkstyle.xml but adds 
> these checks for Javadocs:
> {code}
> 
> 
> 
> 
>   
>   
>   
>   
>   
>   
>   
>   
> 
> 
> 
> 
>   
>   
>   
> 
> 
>   
>   
> 
> {code}
> This checks:
>  - Every type has a type-level Javadoc
>  - Proper use of {{}} in Javadocs
>  - First sentence must end with a proper punctuation mark
>  - Proper use (including closing) of HTML tags



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] flink pull request #3567: [FLINK-6107] Add custom checkstyle for flink-strea...

2017-03-25 Thread greghogan

Github user greghogan commented on a diff in the pull request:

https://github.com/apache/flink/pull/3567#discussion_r108035621
  
--- Diff: tools/maven/strict-checkstyle.xml ---
@@ -0,0 +1,550 @@
+
+
+http://www.puppycrawl.com/dtds/configuration_1_3.dtd";>
+
+
+
+
+
+  
+
+  
+
+  
+
+
+  
+
+  
+
+
+
+
+  
+
+  
+
+
+
+  
+
+  
+
+
+
+  
+
+  
+  
+
+  
+
+  
+  
+
+
+
+  
+  
+
+  
+
+  
+  
+
+  
+
+  
+
+  
+
+  
+  
+
+
+
+
+  
+  
+  
+
+
+
+
+  
+  
+  
--- End diff --

The section comment is ``.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #3567: [FLINK-6107] Add custom checkstyle for flink-streaming-ja...

2017-03-25 Thread greghogan

Github user greghogan commented on the issue:

https://github.com/apache/flink/pull/3567
  
@aljoscha, I likewise have no great preference for import order. I do think 
it is important for the checkstyle to match IntelliJ's code style, either the 
default or a provided Flink style.

The default IntelliJ import style can be approximated:

```







```

That includes a blank line between `javax` and `java` imports. There is an 
[old ticket with recent 
activity](https://github.com/checkstyle/checkstyle/issues/525) for this issue 
to allow blank lines to be explicitly defined.

That said, I'd go for the Google Style as used in this PR. IntelliJ can 
import from a checkstyle configuration but I am not seeing any effect from this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-6107) Add custom checkstyle for flink-streaming-java

2017-03-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15941731#comment-15941731
 ] 

ASF GitHub Bot commented on FLINK-6107:
---

Github user greghogan commented on the issue:

https://github.com/apache/flink/pull/3567
  
@aljoscha, I likewise have no great preference for import order. I do think 
it is important for the checkstyle to match IntelliJ's code style, either the 
default or a provided Flink style.

The default IntelliJ import style can be approximated:

```







```

That includes a blank line between `javax` and `java` imports. There is an 
[old ticket with recent 
activity](https://github.com/checkstyle/checkstyle/issues/525) for this issue 
to allow blank lines to be explicitly defined.

That said, I'd go for the Google Style as used in this PR. IntelliJ can 
import from a checkstyle configuration but I am not seeing any effect from this.


> Add custom checkstyle for flink-streaming-java
> --
>
> Key: FLINK-6107
> URL: https://issues.apache.org/jira/browse/FLINK-6107
> Project: Flink
>  Issue Type: Improvement
>  Components: DataStream API
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>
> There was some consensus on the ML 
> (https://lists.apache.org/thread.html/94c8c5186b315c58c3f8aaf536501b99e8b92ee97b0034dee295ff6a@%3Cdev.flink.apache.org%3E)
>  that we want to have a more uniform code style. We should start 
> module-by-module and by introducing increasingly stricter rules. We have to 
> be aware of the PR situation and ensure that we have minimal breakage for 
> contributors.
> This issue aims at adding a custom checkstyle.xml for 
> {{flink-streaming-java}} that is based on our current checkstyle.xml but adds 
> these checks for Javadocs:
> {code}
> 
> 
> 
> 
>   
>   
>   
>   
>   
>   
>   
>   
> 
> 
> 
> 
>   
>   
>   
> 
> 
>   
>   
> 
> {code}
> This checks:
>  - Every type has a type-level Javadoc
>  - Proper use of {{}} in Javadocs
>  - First sentence must end with a proper punctuation mark
>  - Proper use (including closing) of HTML tags



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (FLINK-6185) Input readers and output writers/formats need to support gzip

2017-03-25 Thread Greg Hogan (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15941736#comment-15941736
 ] 

Greg Hogan commented on FLINK-6185:
---

There is some [support for this 
already|https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/batch/index.html#read-compressed-files].
 I would expect output compression to be similar to reading compressed input 
(see {{InflaterInputStreamFactory}}) and parallelism is not an issue. Is this a 
feature you would like to work on?

> Input readers and output writers/formats need to support gzip
> -
>
> Key: FLINK-6185
> URL: https://issues.apache.org/jira/browse/FLINK-6185
> Project: Flink
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.2.0
>Reporter: Luke Hutchison
>Priority: Minor
>
> File sources (such as {{ExecutionEnvironment#readCsvFile()}}) and sinks (such 
> as {{FileOutputFormat}} and its subclasses, and methods such as 
> {{DataSet#writeAsText()}}) need the ability to transparently decompress and 
> compress files. Primarily gzip would be useful, but it would be nice if this 
> were pluggable to support bzip2, xz, etc.
> There could be options for autodetect (based on file extension and/or file 
> content), which could be the default, as well as no compression or a selected 
> compression method.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] flink issue #3604: [FLINK-5998] Un-fat Hadoop from Flink fat jar

2017-03-25 Thread greghogan

Github user greghogan commented on the issue:

https://github.com/apache/flink/pull/3604
  
@rmetzger thanks for the clarification. Sounds good!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-5998) Un-fat Hadoop from Flink fat jar

2017-03-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-5998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15941745#comment-15941745
 ] 

ASF GitHub Bot commented on FLINK-5998:
---

Github user greghogan commented on the issue:

https://github.com/apache/flink/pull/3604
  
@rmetzger thanks for the clarification. Sounds good!


> Un-fat Hadoop from Flink fat jar
> 
>
> Key: FLINK-5998
> URL: https://issues.apache.org/jira/browse/FLINK-5998
> Project: Flink
>  Issue Type: Improvement
>  Components: Build System
>Reporter: Robert Metzger
>Assignee: Haohui Mai
>
> As a first step towards FLINK-2268, I would suggest to put all hadoop 
> dependencies into a jar separate from Flink's fat jar.
> This would allow users to put a custom Hadoop jar in there, or even deploy 
> Flink without a Hadoop fat jar at all in environments where Hadoop is 
> provided (EMR).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (FLINK-5754) released tag missing .gitigonore .travis.yml .gitattributes

2017-03-25 Thread Greg Hogan (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-5754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15941752#comment-15941752
 ] 

Greg Hogan commented on FLINK-5754:
---

[~shijinkui], from a nice [StackOverflow 
anser|http://stackoverflow.com/questions/1457103/how-is-a-tag-different-from-a-branch-which-should-i-use-here]
 describing the differences between git branch and tags:

  A tag represents a version of a particular branch at a moment in time. A 
branch represents a separate thread of development that may run concurrently 
with other development efforts on the same code base. Changes to a branch may 
eventually be merged back into another branch to unify them.

> released tag missing .gitigonore  .travis.yml .gitattributes
> 
>
> Key: FLINK-5754
> URL: https://issues.apache.org/jira/browse/FLINK-5754
> Project: Flink
>  Issue Type: Bug
>  Components: Build System
>Reporter: shijinkui
>
> released tag missing .gitigonore  .travis.yml .gitattributes.
> When make a release version, should only replace the version.
> for example: https://github.com/apache/spark/tree/v2.1.0



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Comment Edited] (FLINK-5754) released tag missing .gitigonore .travis.yml .gitattributes

2017-03-25 Thread Greg Hogan (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-5754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15941752#comment-15941752
 ] 

Greg Hogan edited comment on FLINK-5754 at 3/25/17 3:03 PM:


[~shijinkui], from a nice [StackOverflow 
anser|http://stackoverflow.com/questions/1457103/how-is-a-tag-different-from-a-branch-which-should-i-use-here]
 describing the differences between git branch and tags:

bq. A tag represents a version of a particular branch at a moment in time. A 
branch represents a separate thread of development that may run concurrently 
with other development efforts on the same code base. Changes to a branch may 
eventually be merged back into another branch to unify them.

Don't you want to work off Flink branches?


was (Author: greghogan):
[~shijinkui], from a nice [StackOverflow 
anser|http://stackoverflow.com/questions/1457103/how-is-a-tag-different-from-a-branch-which-should-i-use-here]
 describing the differences between git branch and tags:

  A tag represents a version of a particular branch at a moment in time. A 
branch represents a separate thread of development that may run concurrently 
with other development efforts on the same code base. Changes to a branch may 
eventually be merged back into another branch to unify them.

> released tag missing .gitigonore  .travis.yml .gitattributes
> 
>
> Key: FLINK-5754
> URL: https://issues.apache.org/jira/browse/FLINK-5754
> Project: Flink
>  Issue Type: Bug
>  Components: Build System
>Reporter: shijinkui
>
> released tag missing .gitigonore  .travis.yml .gitattributes.
> When make a release version, should only replace the version.
> for example: https://github.com/apache/spark/tree/v2.1.0



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (FLINK-5498) Add support for left/right outer joins with non-equality predicates (and 1+ equality predicates)

2017-03-25 Thread lincoln.lee (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-5498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15941761#comment-15941761
 ] 

lincoln.lee commented on FLINK-5498:


@Fabian, agree with you, add another groupBy and apply a GroupReduceFunction to 
make a trade-off for memory-safe and efficiency, your implementation looks work 
for FULL OUTER JOINs as well, I think it's the right direction for now. I'm 
working on some urgent internal issues now and till next weekend. I'll pick up 
this jira later after that.

> Add support for left/right outer joins with non-equality predicates (and 1+ 
> equality predicates)
> 
>
> Key: FLINK-5498
> URL: https://issues.apache.org/jira/browse/FLINK-5498
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Affects Versions: 1.3.0
>Reporter: lincoln.lee
>Assignee: lincoln.lee
>Priority: Minor
>
> I found the expected result of a unit test case incorrect compare to that in 
> a RDMBS, 
> see 
> flink-libraries/flink-table/src/test/scala/org/apache/flink/table/api/scala/batch/table/JoinITCase.scala
> {code:title=JoinITCase.scala}
> def testRightJoinWithNotOnlyEquiJoin(): Unit = {
>  ...
>  val ds1 = CollectionDataSets.get3TupleDataSet(env).toTable(tEnv, 'a, 'b, 
> 'c)
>  val ds2 = CollectionDataSets.get5TupleDataSet(env).toTable(tEnv, 'd, 'e, 
> 'f, 'g, 'h)
>  val joinT = ds1.rightOuterJoin(ds2, 'a === 'd && 'b < 'h).select('c, 'g)
>  
>  val expected = "Hello world,BCD\n"
>  val results = joinT.toDataSet[Row].collect()
>  TestBaseUtils.compareResultAsText(results.asJava, expected)
> }
> {code}
> Then I took some time to learn about the ‘outer join’ in relational 
> databases, the right result of above case should be(tested in SQL Server and 
> MySQL, the results are same):
> {code}
> > select c, g from tuple3 right outer join tuple5 on a=f and b cg   
>  
> NULL Hallo   
> NULL Hallo Welt  
> NULL Hallo Welt wie  
> NULL Hallo Welt wie gehts?   
> NULL ABC 
> Hello world  BCD 
> NULL CDE 
> NULL DEF 
> NULL EFG 
> NULL FGH 
> NULL GHI 
> NULL HIJ 
> NULL IJK 
> NULL JKL 
> NULL KLM   
> {code}
> the join condition {{rightOuterJoin('a === 'd && 'b < 'h)}} is not equivalent 
> to {{rightOuterJoin('a === 'd).where('b < 'h)}}.  
> The problem is rooted in the code-generated {{JoinFunction}} (see 
> {{DataSetJoin.translateToPlan()}}, line 188). If the join condition does not 
> match, we must emit the outer row padded with nulls instead of returning from 
> the function without emitting anything.
> The code-generated {{JoinFunction}} does also include equality predicates. 
> These should be removed before generating the code, e.g., in 
> {{DataSetJoinRule}} when generating the {{DataSetJoin}} with help of 
> {{JoinInfo.getRemaining()}}.
> More details: https://goo.gl/ngekca



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (FLINK-5929) Allow Access to Per-Window State in ProcessWindowFunction

2017-03-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-5929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15941771#comment-15941771
 ] 

ASF GitHub Bot commented on FLINK-5929:
---

Github user aljoscha commented on the issue:

https://github.com/apache/flink/pull/3479
  
Thanks for implementing this! 😃 

I just merged, could you please close this PR?


> Allow Access to Per-Window State in ProcessWindowFunction
> -
>
> Key: FLINK-5929
> URL: https://issues.apache.org/jira/browse/FLINK-5929
> Project: Flink
>  Issue Type: Improvement
>  Components: DataStream API
>Reporter: Aljoscha Krettek
>
> Right now, the state that a {{WindowFunction}} or {{ProcessWindowFunction}} 
> can access is scoped to the key of the window but not the window itself. That 
> is, state is global across all windows for a given key.
> For some use cases it is beneficial to keep state scoped to a window. For 
> example, if you expect to have several {{Trigger}} firings (due to early and 
> late firings) a user can keep state per window to keep some information 
> between those firings.
> The per-window state has to be cleaned up in some way. For this I see two 
> options:
>  - Keep track of all state that a user uses and clean up when we reach the 
> window GC horizon.
>  - Add a method {{cleanup()}} to {{ProcessWindowFunction}} which is called 
> when we reach the window GC horizon that users can/should use to clean up 
> their state.
> On the API side, we can add a method {{windowState()}} on 
> {{ProcessWindowFunction.Context}} that retrieves the per-window state and 
> {{globalState()}} that would allow access to the (already available) global 
> state. The {{Context}} would then look like this:
> {code}
> /**
>  * The context holding window metadata
>  */
> public abstract class Context {
> /**
>  * @return The window that is being evaluated.
>  */
> public abstract W window();
> /**
>  * State accessor for per-key and per-window state.
>  */
> KeyedStateStore windowState();
> /**
>  * State accessor for per-key global state.
>  */
> KeyedStateStore globalState();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] flink issue #3479: [FLINK-5929] Allow Access to Per-Window State in ProcessW...

2017-03-25 Thread aljoscha

Github user aljoscha commented on the issue:

https://github.com/apache/flink/pull/3479
  
Thanks for implementing this! ð 

I just merged, could you please close this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Assigned] (FLINK-5929) Allow Access to Per-Window State in ProcessWindowFunction

2017-03-25 Thread Aljoscha Krettek (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-5929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek reassigned FLINK-5929:
---

Assignee: Seth Wiesman

> Allow Access to Per-Window State in ProcessWindowFunction
> -
>
> Key: FLINK-5929
> URL: https://issues.apache.org/jira/browse/FLINK-5929
> Project: Flink
>  Issue Type: Improvement
>  Components: DataStream API
>Reporter: Aljoscha Krettek
>Assignee: Seth Wiesman
> Fix For: 1.3.0
>
>
> Right now, the state that a {{WindowFunction}} or {{ProcessWindowFunction}} 
> can access is scoped to the key of the window but not the window itself. That 
> is, state is global across all windows for a given key.
> For some use cases it is beneficial to keep state scoped to a window. For 
> example, if you expect to have several {{Trigger}} firings (due to early and 
> late firings) a user can keep state per window to keep some information 
> between those firings.
> The per-window state has to be cleaned up in some way. For this I see two 
> options:
>  - Keep track of all state that a user uses and clean up when we reach the 
> window GC horizon.
>  - Add a method {{cleanup()}} to {{ProcessWindowFunction}} which is called 
> when we reach the window GC horizon that users can/should use to clean up 
> their state.
> On the API side, we can add a method {{windowState()}} on 
> {{ProcessWindowFunction.Context}} that retrieves the per-window state and 
> {{globalState()}} that would allow access to the (already available) global 
> state. The {{Context}} would then look like this:
> {code}
> /**
>  * The context holding window metadata
>  */
> public abstract class Context {
> /**
>  * @return The window that is being evaluated.
>  */
> public abstract W window();
> /**
>  * State accessor for per-key and per-window state.
>  */
> KeyedStateStore windowState();
> /**
>  * State accessor for per-key global state.
>  */
> KeyedStateStore globalState();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Closed] (FLINK-5929) Allow Access to Per-Window State in ProcessWindowFunction

2017-03-25 Thread Aljoscha Krettek (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-5929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek closed FLINK-5929.
---
   Resolution: Fixed
Fix Version/s: 1.3.0

Implemented on master in
fad201bfb0b1f2757f68f7b3ffaf97a486eb93e8

> Allow Access to Per-Window State in ProcessWindowFunction
> -
>
> Key: FLINK-5929
> URL: https://issues.apache.org/jira/browse/FLINK-5929
> Project: Flink
>  Issue Type: Improvement
>  Components: DataStream API
>Reporter: Aljoscha Krettek
>Assignee: Seth Wiesman
> Fix For: 1.3.0
>
>
> Right now, the state that a {{WindowFunction}} or {{ProcessWindowFunction}} 
> can access is scoped to the key of the window but not the window itself. That 
> is, state is global across all windows for a given key.
> For some use cases it is beneficial to keep state scoped to a window. For 
> example, if you expect to have several {{Trigger}} firings (due to early and 
> late firings) a user can keep state per window to keep some information 
> between those firings.
> The per-window state has to be cleaned up in some way. For this I see two 
> options:
>  - Keep track of all state that a user uses and clean up when we reach the 
> window GC horizon.
>  - Add a method {{cleanup()}} to {{ProcessWindowFunction}} which is called 
> when we reach the window GC horizon that users can/should use to clean up 
> their state.
> On the API side, we can add a method {{windowState()}} on 
> {{ProcessWindowFunction.Context}} that retrieves the per-window state and 
> {{globalState()}} that would allow access to the (already available) global 
> state. The {{Context}} would then look like this:
> {code}
> /**
>  * The context holding window metadata
>  */
> public abstract class Context {
> /**
>  * @return The window that is being evaluated.
>  */
> public abstract W window();
> /**
>  * State accessor for per-key and per-window state.
>  */
> KeyedStateStore windowState();
> /**
>  * State accessor for per-key global state.
>  */
> KeyedStateStore globalState();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (FLINK-5929) Allow Access to Per-Window State in ProcessWindowFunction

2017-03-25 Thread Aljoscha Krettek (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-5929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15941773#comment-15941773
 ] 

Aljoscha Krettek commented on FLINK-5929:
-

[~sjwiesman] I created FLINK-6163 and FLINK-6164 as follow-up issues. Just 
letting you know in case you're interested. :-)

> Allow Access to Per-Window State in ProcessWindowFunction
> -
>
> Key: FLINK-5929
> URL: https://issues.apache.org/jira/browse/FLINK-5929
> Project: Flink
>  Issue Type: Improvement
>  Components: DataStream API
>Reporter: Aljoscha Krettek
>Assignee: Seth Wiesman
> Fix For: 1.3.0
>
>
> Right now, the state that a {{WindowFunction}} or {{ProcessWindowFunction}} 
> can access is scoped to the key of the window but not the window itself. That 
> is, state is global across all windows for a given key.
> For some use cases it is beneficial to keep state scoped to a window. For 
> example, if you expect to have several {{Trigger}} firings (due to early and 
> late firings) a user can keep state per window to keep some information 
> between those firings.
> The per-window state has to be cleaned up in some way. For this I see two 
> options:
>  - Keep track of all state that a user uses and clean up when we reach the 
> window GC horizon.
>  - Add a method {{cleanup()}} to {{ProcessWindowFunction}} which is called 
> when we reach the window GC horizon that users can/should use to clean up 
> their state.
> On the API side, we can add a method {{windowState()}} on 
> {{ProcessWindowFunction.Context}} that retrieves the per-window state and 
> {{globalState()}} that would allow access to the (already available) global 
> state. The {{Context}} would then look like this:
> {code}
> /**
>  * The context holding window metadata
>  */
> public abstract class Context {
> /**
>  * @return The window that is being evaluated.
>  */
> public abstract W window();
> /**
>  * State accessor for per-key and per-window state.
>  */
> KeyedStateStore windowState();
> /**
>  * State accessor for per-key global state.
>  */
> KeyedStateStore globalState();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (FLINK-5498) Add support for left/right outer joins with non-equality predicates (and 1+ equality predicates)

2017-03-25 Thread Fabian Hueske (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-5498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15941789#comment-15941789
 ] 

Fabian Hueske commented on FLINK-5498:
--

Thanks [~lincoln.86xy], 

we might be able to tweak the plan and even avoid a sort for the 
GroupReduceFunction if we execute the join with a sort merge join strategy and 
sort the outer side on all attributes (with the key attributes being a prefix).

> Add support for left/right outer joins with non-equality predicates (and 1+ 
> equality predicates)
> 
>
> Key: FLINK-5498
> URL: https://issues.apache.org/jira/browse/FLINK-5498
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Affects Versions: 1.3.0
>Reporter: lincoln.lee
>Assignee: lincoln.lee
>Priority: Minor
>
> I found the expected result of a unit test case incorrect compare to that in 
> a RDMBS, 
> see 
> flink-libraries/flink-table/src/test/scala/org/apache/flink/table/api/scala/batch/table/JoinITCase.scala
> {code:title=JoinITCase.scala}
> def testRightJoinWithNotOnlyEquiJoin(): Unit = {
>  ...
>  val ds1 = CollectionDataSets.get3TupleDataSet(env).toTable(tEnv, 'a, 'b, 
> 'c)
>  val ds2 = CollectionDataSets.get5TupleDataSet(env).toTable(tEnv, 'd, 'e, 
> 'f, 'g, 'h)
>  val joinT = ds1.rightOuterJoin(ds2, 'a === 'd && 'b < 'h).select('c, 'g)
>  
>  val expected = "Hello world,BCD\n"
>  val results = joinT.toDataSet[Row].collect()
>  TestBaseUtils.compareResultAsText(results.asJava, expected)
> }
> {code}
> Then I took some time to learn about the ‘outer join’ in relational 
> databases, the right result of above case should be(tested in SQL Server and 
> MySQL, the results are same):
> {code}
> > select c, g from tuple3 right outer join tuple5 on a=f and b cg   
>  
> NULL Hallo   
> NULL Hallo Welt  
> NULL Hallo Welt wie  
> NULL Hallo Welt wie gehts?   
> NULL ABC 
> Hello world  BCD 
> NULL CDE 
> NULL DEF 
> NULL EFG 
> NULL FGH 
> NULL GHI 
> NULL HIJ 
> NULL IJK 
> NULL JKL 
> NULL KLM   
> {code}
> the join condition {{rightOuterJoin('a === 'd && 'b < 'h)}} is not equivalent 
> to {{rightOuterJoin('a === 'd).where('b < 'h)}}.  
> The problem is rooted in the code-generated {{JoinFunction}} (see 
> {{DataSetJoin.translateToPlan()}}, line 188). If the join condition does not 
> match, we must emit the outer row padded with nulls instead of returning from 
> the function without emitting anything.
> The code-generated {{JoinFunction}} does also include equality predicates. 
> These should be removed before generating the code, e.g., in 
> {{DataSetJoinRule}} when generating the {{DataSetJoin}} with help of 
> {{JoinInfo.getRemaining()}}.
> More details: https://goo.gl/ngekca



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] flink pull request #3479: [FLINK-5929] Allow Access to Per-Window State in P...

2017-03-25 Thread sjwiesman

Github user sjwiesman closed the pull request at:

https://github.com/apache/flink/pull/3479


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #3479: [FLINK-5929] Allow Access to Per-Window State in ProcessW...

2017-03-25 Thread sjwiesman

Github user sjwiesman commented on the issue:

https://github.com/apache/flink/pull/3479
  
Done! Thank you for for helping me get this feature merged in. This has to 
be one of the most painless commits I've ever made to an open source project of 
this size. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-5929) Allow Access to Per-Window State in ProcessWindowFunction

2017-03-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-5929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15941792#comment-15941792
 ] 

ASF GitHub Bot commented on FLINK-5929:
---

Github user sjwiesman closed the pull request at:

https://github.com/apache/flink/pull/3479


> Allow Access to Per-Window State in ProcessWindowFunction
> -
>
> Key: FLINK-5929
> URL: https://issues.apache.org/jira/browse/FLINK-5929
> Project: Flink
>  Issue Type: Improvement
>  Components: DataStream API
>Reporter: Aljoscha Krettek
>Assignee: Seth Wiesman
> Fix For: 1.3.0
>
>
> Right now, the state that a {{WindowFunction}} or {{ProcessWindowFunction}} 
> can access is scoped to the key of the window but not the window itself. That 
> is, state is global across all windows for a given key.
> For some use cases it is beneficial to keep state scoped to a window. For 
> example, if you expect to have several {{Trigger}} firings (due to early and 
> late firings) a user can keep state per window to keep some information 
> between those firings.
> The per-window state has to be cleaned up in some way. For this I see two 
> options:
>  - Keep track of all state that a user uses and clean up when we reach the 
> window GC horizon.
>  - Add a method {{cleanup()}} to {{ProcessWindowFunction}} which is called 
> when we reach the window GC horizon that users can/should use to clean up 
> their state.
> On the API side, we can add a method {{windowState()}} on 
> {{ProcessWindowFunction.Context}} that retrieves the per-window state and 
> {{globalState()}} that would allow access to the (already available) global 
> state. The {{Context}} would then look like this:
> {code}
> /**
>  * The context holding window metadata
>  */
> public abstract class Context {
> /**
>  * @return The window that is being evaluated.
>  */
> public abstract W window();
> /**
>  * State accessor for per-key and per-window state.
>  */
> KeyedStateStore windowState();
> /**
>  * State accessor for per-key global state.
>  */
> KeyedStateStore globalState();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (FLINK-5929) Allow Access to Per-Window State in ProcessWindowFunction

2017-03-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-5929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15941791#comment-15941791
 ] 

ASF GitHub Bot commented on FLINK-5929:
---

Github user sjwiesman commented on the issue:

https://github.com/apache/flink/pull/3479
  
Done! Thank you for for helping me get this feature merged in. This has to 
be one of the most painless commits I've ever made to an open source project of 
this size. 


> Allow Access to Per-Window State in ProcessWindowFunction
> -
>
> Key: FLINK-5929
> URL: https://issues.apache.org/jira/browse/FLINK-5929
> Project: Flink
>  Issue Type: Improvement
>  Components: DataStream API
>Reporter: Aljoscha Krettek
>Assignee: Seth Wiesman
> Fix For: 1.3.0
>
>
> Right now, the state that a {{WindowFunction}} or {{ProcessWindowFunction}} 
> can access is scoped to the key of the window but not the window itself. That 
> is, state is global across all windows for a given key.
> For some use cases it is beneficial to keep state scoped to a window. For 
> example, if you expect to have several {{Trigger}} firings (due to early and 
> late firings) a user can keep state per window to keep some information 
> between those firings.
> The per-window state has to be cleaned up in some way. For this I see two 
> options:
>  - Keep track of all state that a user uses and clean up when we reach the 
> window GC horizon.
>  - Add a method {{cleanup()}} to {{ProcessWindowFunction}} which is called 
> when we reach the window GC horizon that users can/should use to clean up 
> their state.
> On the API side, we can add a method {{windowState()}} on 
> {{ProcessWindowFunction.Context}} that retrieves the per-window state and 
> {{globalState()}} that would allow access to the (already available) global 
> state. The {{Context}} would then look like this:
> {code}
> /**
>  * The context holding window metadata
>  */
> public abstract class Context {
> /**
>  * @return The window that is being evaluated.
>  */
> public abstract W window();
> /**
>  * State accessor for per-key and per-window state.
>  */
> KeyedStateStore windowState();
> /**
>  * State accessor for per-key global state.
>  */
> KeyedStateStore globalState();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (FLINK-6185) Input readers and output writers/formats need to support gzip

2017-03-25 Thread Luke Hutchison (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15941795#comment-15941795
 ] 

Luke Hutchison commented on FLINK-6185:
---

Ah, sorry that I missed that. I may contribute some code to Flink in the 
future, but unfortunately right now I'm maxed out, so I hope that all the bug 
reports I'm sending in constitute some sort of contribution.

> Input readers and output writers/formats need to support gzip
> -
>
> Key: FLINK-6185
> URL: https://issues.apache.org/jira/browse/FLINK-6185
> Project: Flink
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.2.0
>Reporter: Luke Hutchison
>Priority: Minor
>
> File sources (such as {{ExecutionEnvironment#readCsvFile()}}) and sinks (such 
> as {{FileOutputFormat}} and its subclasses, and methods such as 
> {{DataSet#writeAsText()}}) need the ability to transparently decompress and 
> compress files. Primarily gzip would be useful, but it would be nice if this 
> were pluggable to support bzip2, xz, etc.
> There could be options for autodetect (based on file extension and/or file 
> content), which could be the default, as well as no compression or a selected 
> compression method.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (FLINK-6185) Output writers and OutputFormats need to support compression

2017-03-25 Thread Luke Hutchison (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-6185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Hutchison updated FLINK-6185:
--
Summary: Output writers and OutputFormats need to support compression  
(was: Output writers and OutputFormats need to support gzip)

> Output writers and OutputFormats need to support compression
> 
>
> Key: FLINK-6185
> URL: https://issues.apache.org/jira/browse/FLINK-6185
> Project: Flink
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.2.0
>Reporter: Luke Hutchison
>Priority: Minor
>
> File sources (such as {{ExecutionEnvironment#readCsvFile()}}) and sinks (such 
> as {{FileOutputFormat}} and its subclasses, and methods such as 
> {{DataSet#writeAsText()}}) need the ability to transparently decompress and 
> compress files. Primarily gzip would be useful, but it would be nice if this 
> were pluggable to support bzip2, xz, etc.
> There could be options for autodetect (based on file extension and/or file 
> content), which could be the default, as well as no compression or a selected 
> compression method.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (FLINK-6185) Output writers and OutputFormats need to support gzip

2017-03-25 Thread Luke Hutchison (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-6185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Hutchison updated FLINK-6185:
--
Summary: Output writers and OutputFormats need to support gzip  (was: Input 
readers and output writers/formats need to support gzip)

> Output writers and OutputFormats need to support gzip
> -
>
> Key: FLINK-6185
> URL: https://issues.apache.org/jira/browse/FLINK-6185
> Project: Flink
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.2.0
>Reporter: Luke Hutchison
>Priority: Minor
>
> File sources (such as {{ExecutionEnvironment#readCsvFile()}}) and sinks (such 
> as {{FileOutputFormat}} and its subclasses, and methods such as 
> {{DataSet#writeAsText()}}) need the ability to transparently decompress and 
> compress files. Primarily gzip would be useful, but it would be nice if this 
> were pluggable to support bzip2, xz, etc.
> There could be options for autodetect (based on file extension and/or file 
> content), which could be the default, as well as no compression or a selected 
> compression method.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (FLINK-6185) Output writers and OutputFormats need to support compression

2017-03-25 Thread Luke Hutchison (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15941797#comment-15941797
 ] 

Luke Hutchison commented on FLINK-6185:
---

Here's my simple gzip OutputFormat though, in case anyone else is looking for a 
quick solution:

{code}
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.PrintWriter;
import java.util.zip.GZIPOutputStream;

import org.apache.flink.api.common.io.OutputFormat;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.core.fs.FileSystem.WriteMode;

public class GZippableTextOutputFormat implements OutputFormat {
private static final long serialVersionUID = 1L;

private File file;
private PrintWriter writer;
private boolean gzip;

public GZippableTextOutputFormat(File file, WriteMode writeMode, boolean 
gzip) {
if (writeMode != WriteMode.OVERWRITE) {
// Make this explicit, since we're about to overwrite the file
throw new IllegalArgumentException("writeMode must be 
WriteMode.OVERWRITE");
}
this.file = file.getPath().endsWith(".gz") == gzip ? file : new 
File(file.getPath() + ".gz");
this.gzip = gzip;
}

@Override
public void open(int taskNumber, int numTasks) throws IOException {
if (taskNumber < 0) {
throw new IllegalArgumentException("Invalid task number");
}
if (numTasks == 0 || numTasks > 1) {
throw new IllegalArgumentException(
"must call setParallelism(1) to use " + 
ZippedJSONCollectionOutputFormat.class.getName());
}
try {
writer = gzip ? new PrintWriter(new GZIPOutputStream(new 
FileOutputStream(file)))
: new PrintWriter(new FileOutputStream(file));
} catch (Exception e) {
close();
throw new RuntimeException(e);
}
}

public File getFile() {
return file;
}

@Override
public void configure(Configuration parameters) {
}

@Override
public void writeRecord(T record) throws IOException {
writer.println(record.toString());
}

@Override
public void close() throws IOException {
writer.close();
}

@Override
public String toString() {
return this.getClass().getSimpleName() + "(" + file + ")";
}
}
{code}

> Output writers and OutputFormats need to support compression
> 
>
> Key: FLINK-6185
> URL: https://issues.apache.org/jira/browse/FLINK-6185
> Project: Flink
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.2.0
>Reporter: Luke Hutchison
>Priority: Minor
>
> File sources (such as {{ExecutionEnvironment#readCsvFile()}}) and sinks (such 
> as {{FileOutputFormat}} and its subclasses, and methods such as 
> {{DataSet#writeAsText()}}) need the ability to transparently decompress and 
> compress files. Primarily gzip would be useful, but it would be nice if this 
> were pluggable to support bzip2, xz, etc.
> There could be options for autodetect (based on file extension and/or file 
> content), which could be the default, as well as no compression or a selected 
> compression method.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (FLINK-6115) Need more helpful error message when trying to serialize a tuple with a null field

2017-03-25 Thread Luke Hutchison (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15941882#comment-15941882
 ] 

Luke Hutchison commented on FLINK-6115:
---

[~greghogan] versioning of durable binary formats is not only standard 
practice, it is important.

Yes, allowing null values in Java was a big mistake, but option types didn't 
exist back when nullable references were added to Java. And an {{Option}} or 
{{Maybe}} type was the first thing I went looking for in Flink to address the 
non-nullable tuple field issue. Flink definitely needs an {{Option}} type until 
Java supports its own (just as having its own {{Tuple}} types is critical to 
the usefulness of Flink). (For that matter, {{flatMap}} should work over 
{{Option}} types, treating them as collected lists of length 0 or 1, I went 
looking for that too...)

However, the fact that null pointers (and general lack of strong nullability 
analysis) have caused no manner of pain to users of Java doesn't mean that they 
are not crucial to the way that the language works today, or to how programmers 
tend to use it. Even Flink uses null values the way they're generally used in 
Java in place of {{Option}} types -- e.g. giving you null values on an outer 
join when there is no corresponding key in one dataset.

Yes, throwing a NPE in tuple constructors misses the manual setting of field 
values, I mentioned that in an earlier comment. However, it actually highly 
surprised me when I noticed that the tuple fields were non-final, based on one 
of the first things I read in the Flink documentation: "Flink has the special 
classes {{DataSet}} and {{DataStream}} to represent data in a program. You can 
think of them as immutable collections of data that can contain duplicates." If 
the collections and streams that contain tuples are supposed to be thought of 
as immutable, why should the individual elements of those collections and 
streams be mutable? Perhaps tuple field values should be made final (which 
would of course be a breaking change for some users, and would probably 
especially require changes internally in the aggregation operator code).

Setting aside the issue of null fields in tuples, this will surely not be the 
last time that the serialization format will need to change! What if, for 
example, Flink needs to add support for some future new Java type, or needs to 
support another JVM language that requires some extra metadata of some form to 
be stored along with its serialized objects? I strongly recommend versioning 
the checkpoint files.

> Need more helpful error message when trying to serialize a tuple with a null 
> field
> --
>
> Key: FLINK-6115
> URL: https://issues.apache.org/jira/browse/FLINK-6115
> Project: Flink
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.2.0
>Reporter: Luke Hutchison
>
> When Flink tries to serialize a tuple with a null field, you get the 
> following, which has no information about where in the program the problem 
> occurred (all the stack trace lines are in Flink, not in user code).
> {noformat}
> Exception in thread "main" 
> org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
>   at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply$mcV$sp(JobManager.scala:900)
>   at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply(JobManager.scala:843)
>   at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply(JobManager.scala:843)
>   at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
>   at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
>   at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
>   at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
>   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>   at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> Caused by: java.lang.IllegalArgumentException: The record must not be null.
>   at 
> org.apache.flink.api.common.typeutils.base.array.StringArraySerializer.serialize(StringArraySerializer.java:73)
>   at 
> org.apache.flink.api.common.typeutils.base.array.StringArraySerializer.serialize(StringArraySerializer.java:33)
>   at 
> org.apache.flink.api.java.typeutils.runtime.TupleSerializer.serialize(TupleSeriali

[jira] [Comment Edited] (FLINK-6185) Output writers and OutputFormats need to support compression

2017-03-25 Thread Luke Hutchison (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15941797#comment-15941797
 ] 

Luke Hutchison edited comment on FLINK-6185 at 3/25/17 8:22 PM:


Here's my simple gzip OutputFormat though, in case anyone else is looking for a 
quick solution:

{code}
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.PrintWriter;
import java.util.zip.GZIPOutputStream;

import org.apache.flink.api.common.io.OutputFormat;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.core.fs.FileSystem.WriteMode;

public class GZippableTextOutputFormat implements OutputFormat {
private static final long serialVersionUID = 1L;

private File file;
private PrintWriter writer;
private boolean gzip;

public GZippableTextOutputFormat(File file, WriteMode writeMode, boolean 
gzip) {
if (writeMode != WriteMode.OVERWRITE) {
// Make this explicit, since we're about to overwrite the file
throw new IllegalArgumentException("writeMode must be 
WriteMode.OVERWRITE");
}
this.file = file.getPath().endsWith(".gz") == gzip ? file : new 
File(file.getPath() + ".gz");
this.gzip = gzip;
}

@Override
public void open(int taskNumber, int numTasks) throws IOException {
if (taskNumber < 0) {
throw new IllegalArgumentException("Invalid task number");
}
if (numTasks == 0 || numTasks > 1) {
throw new IllegalArgumentException(
"must call setParallelism(1) to use " + 
ZippedJSONCollectionOutputFormat.class.getName());
}
try {
writer = gzip ? new PrintWriter(new GZIPOutputStream(new 
FileOutputStream(file)))
: new PrintWriter(new FileOutputStream(file));
} catch (Exception e) {
throw new RuntimeException(e);
}
}

public File getFile() {
return file;
}

@Override
public void configure(Configuration parameters) {
}

@Override
public void writeRecord(T record) throws IOException {
writer.println(record.toString());
}

@Override
public void close() throws IOException {
if (writer != null) {
writer.close();
}
}

@Override
public String toString() {
return this.getClass().getSimpleName() + "(" + file + ")";
}
}
{code}


was (Author: lukehutch):
Here's my simple gzip OutputFormat though, in case anyone else is looking for a 
quick solution:

{code}
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.PrintWriter;
import java.util.zip.GZIPOutputStream;

import org.apache.flink.api.common.io.OutputFormat;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.core.fs.FileSystem.WriteMode;

public class GZippableTextOutputFormat implements OutputFormat {
private static final long serialVersionUID = 1L;

private File file;
private PrintWriter writer;
private boolean gzip;

public GZippableTextOutputFormat(File file, WriteMode writeMode, boolean 
gzip) {
if (writeMode != WriteMode.OVERWRITE) {
// Make this explicit, since we're about to overwrite the file
throw new IllegalArgumentException("writeMode must be 
WriteMode.OVERWRITE");
}
this.file = file.getPath().endsWith(".gz") == gzip ? file : new 
File(file.getPath() + ".gz");
this.gzip = gzip;
}

@Override
public void open(int taskNumber, int numTasks) throws IOException {
if (taskNumber < 0) {
throw new IllegalArgumentException("Invalid task number");
}
if (numTasks == 0 || numTasks > 1) {
throw new IllegalArgumentException(
"must call setParallelism(1) to use " + 
ZippedJSONCollectionOutputFormat.class.getName());
}
try {
writer = gzip ? new PrintWriter(new GZIPOutputStream(new 
FileOutputStream(file)))
: new PrintWriter(new FileOutputStream(file));
} catch (Exception e) {
close();
throw new RuntimeException(e);
}
}

public File getFile() {
return file;
}

@Override
public void configure(Configuration parameters) {
}

@Override
public void writeRecord(T record) throws IOException {
writer.println(record.toString());
}

@Override
public void close() throws IOException {
writer.close();
}

@Override
public String toString() {
return this.getClass().getSimpleName() + "(" + file + ")";
}
}
{code}

> Output writers and OutputFormats need to support compression
> 
>
> Key: FLINK-6185
>

[jira] [Comment Edited] (FLINK-6185) Output writers and OutputFormats need to support compression

2017-03-25 Thread Luke Hutchison (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15941797#comment-15941797
 ] 

Luke Hutchison edited comment on FLINK-6185 at 3/25/17 8:23 PM:


Here's my simple gzip OutputFormat though, in case anyone else is looking for a 
quick solution:

{code}
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.PrintWriter;
import java.util.zip.GZIPOutputStream;

import org.apache.flink.api.common.io.OutputFormat;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.core.fs.FileSystem.WriteMode;

public class GZippableTextOutputFormat implements OutputFormat {
private static final long serialVersionUID = 1L;

private File file;
private PrintWriter writer;
private boolean gzip;

public GZippableTextOutputFormat(File file, WriteMode writeMode, boolean 
gzip) {
if (writeMode != WriteMode.OVERWRITE) {
// Make this explicit, since we're about to overwrite the file
throw new IllegalArgumentException("writeMode must be 
WriteMode.OVERWRITE");
}
this.file = file.getPath().endsWith(".gz") == gzip ? file : new 
File(file.getPath() + ".gz");
this.gzip = gzip;
}

@Override
public void open(int taskNumber, int numTasks) throws IOException {
if (taskNumber < 0) {
throw new IllegalArgumentException("Invalid task number");
}
if (numTasks == 0 || numTasks > 1) {
throw new IllegalArgumentException(
"must call setParallelism(1) to use " + 
ZippedJSONCollectionOutputFormat.class.getName());
}
writer = gzip ? new PrintWriter(new GZIPOutputStream(new 
FileOutputStream(file)))
: new PrintWriter(new FileOutputStream(file));
}

public File getFile() {
return file;
}

@Override
public void configure(Configuration parameters) {
}

@Override
public void writeRecord(T record) throws IOException {
writer.println(record.toString());
}

@Override
public void close() throws IOException {
if (writer != null) {
writer.close();
}
}

@Override
public String toString() {
return this.getClass().getSimpleName() + "(" + file + ")";
}
}
{code}


was (Author: lukehutch):
Here's my simple gzip OutputFormat though, in case anyone else is looking for a 
quick solution:

{code}
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.PrintWriter;
import java.util.zip.GZIPOutputStream;

import org.apache.flink.api.common.io.OutputFormat;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.core.fs.FileSystem.WriteMode;

public class GZippableTextOutputFormat implements OutputFormat {
private static final long serialVersionUID = 1L;

private File file;
private PrintWriter writer;
private boolean gzip;

public GZippableTextOutputFormat(File file, WriteMode writeMode, boolean 
gzip) {
if (writeMode != WriteMode.OVERWRITE) {
// Make this explicit, since we're about to overwrite the file
throw new IllegalArgumentException("writeMode must be 
WriteMode.OVERWRITE");
}
this.file = file.getPath().endsWith(".gz") == gzip ? file : new 
File(file.getPath() + ".gz");
this.gzip = gzip;
}

@Override
public void open(int taskNumber, int numTasks) throws IOException {
if (taskNumber < 0) {
throw new IllegalArgumentException("Invalid task number");
}
if (numTasks == 0 || numTasks > 1) {
throw new IllegalArgumentException(
"must call setParallelism(1) to use " + 
ZippedJSONCollectionOutputFormat.class.getName());
}
try {
writer = gzip ? new PrintWriter(new GZIPOutputStream(new 
FileOutputStream(file)))
: new PrintWriter(new FileOutputStream(file));
} catch (Exception e) {
throw new RuntimeException(e);
}
}

public File getFile() {
return file;
}

@Override
public void configure(Configuration parameters) {
}

@Override
public void writeRecord(T record) throws IOException {
writer.println(record.toString());
}

@Override
public void close() throws IOException {
if (writer != null) {
writer.close();
}
}

@Override
public String toString() {
return this.getClass().getSimpleName() + "(" + file + ")";
}
}
{code}

> Output writers and OutputFormats need to support compression
> 
>
> Key: FLINK-6185
> URL: https://issues.apache.org/jira/browse/FLINK-6185
> Project: Flin

[jira] [Commented] (FLINK-6188) Some setParallelism() method can't cope with default parallelism

2017-03-25 Thread Aljoscha Krettek (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15941990#comment-15941990
 ] 

Aljoscha Krettek commented on FLINK-6188:
-

This is the thrown exception:
{code}
Caused by: java.lang.IllegalArgumentException: The parallelism of an operator 
must be
at least 1.
at 
org.apache.flink.util.Preconditions.checkArgument(Preconditions.java:139)
at 
org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator.setParallelism(SingleOutputStreamOperator.java:124)
at 
org.apache.flink.streaming.api.datastream.DataStream.assignTimestampsAndWatermarks(DataStream.java:775)
at 
org.apache.flink.streaming.api.scala.DataStream.assignTimestampsAndWatermarks(DataStream.scala:736)
...
{code}

The problem is that {{assignTimestampsAndWatermarks()}} uses 
{{setParallelism()}} with the parallelism of the upstream operation, which can 
be {{-1}}.

> Some setParallelism() method can't cope with default parallelism
> 
>
> Key: FLINK-6188
> URL: https://issues.apache.org/jira/browse/FLINK-6188
> Project: Flink
>  Issue Type: Bug
>  Components: DataStream API
>Affects Versions: 1.2.1
>Reporter: Aljoscha Krettek
>Priority: Blocker
> Fix For: 1.2.1
>
>
> Recent changes done for FLINK-5808 move default parallelism manifestation 
> from eager to lazy, that is, the parallelism of operations that don't have an 
> explicit parallelism is only set when generating the JobGraph. Some 
> `setParallelism()` calls, such as 
> `SingleOutputStreamOperator.setParallelism()` cannot deal with the fact that 
> the parallelism of an operation might be {{-1}} (which indicates that it 
> should take the default parallelism when generating the JobGraph).
> We should either revert the changes that fixed another user-facing bug for 
> version 1.2.1 or fix the methods.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (FLINK-6188) Some setParallelism() method can't cope with default parallelism

2017-03-25 Thread Aljoscha Krettek (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-6188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated FLINK-6188:

Description: 
Recent changes done for FLINK-5808 move default parallelism manifestation from 
eager to lazy, that is, the parallelism of operations that don't have an 
explicit parallelism is only set when generating the JobGraph. Some 
{{setParallelism()}} calls, such as 
{{SingleOutputStreamOperator.setParallelism()}} cannot deal with the fact that 
the parallelism of an operation might be {{-1}} (which indicates that it should 
take the default parallelism when generating the JobGraph).

We should either revert the changes that fixed another user-facing bug for 
version 1.2.1 or fix the methods.

  was:
Recent changes done for FLINK-5808 move default parallelism manifestation from 
eager to lazy, that is, the parallelism of operations that don't have an 
explicit parallelism is only set when generating the JobGraph. Some 
`setParallelism()` calls, such as `SingleOutputStreamOperator.setParallelism()` 
cannot deal with the fact that the parallelism of an operation might be {{-1}} 
(which indicates that it should take the default parallelism when generating 
the JobGraph).

We should either revert the changes that fixed another user-facing bug for 
version 1.2.1 or fix the methods.


> Some setParallelism() method can't cope with default parallelism
> 
>
> Key: FLINK-6188
> URL: https://issues.apache.org/jira/browse/FLINK-6188
> Project: Flink
>  Issue Type: Bug
>  Components: DataStream API
>Affects Versions: 1.2.1
>Reporter: Aljoscha Krettek
>Priority: Blocker
> Fix For: 1.2.1
>
>
> Recent changes done for FLINK-5808 move default parallelism manifestation 
> from eager to lazy, that is, the parallelism of operations that don't have an 
> explicit parallelism is only set when generating the JobGraph. Some 
> {{setParallelism()}} calls, such as 
> {{SingleOutputStreamOperator.setParallelism()}} cannot deal with the fact 
> that the parallelism of an operation might be {{-1}} (which indicates that it 
> should take the default parallelism when generating the JobGraph).
> We should either revert the changes that fixed another user-facing bug for 
> version 1.2.1 or fix the methods.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] flink pull request #3615: support flink-storm metrics

2017-03-25 Thread RalphSu

GitHub user RalphSu opened a pull request:

https://github.com/apache/flink/pull/3615

support flink-storm metrics

Thanks for contributing to Apache Flink. Before you open your pull request, 
please take the following check list into consideration.
If your changes take all of the items into account, feel free to open your 
pull request. For more information and/or questions please refer to the [How To 
Contribute guide](http://flink.apache.org/how-to-contribute.html).
In addition to going through the list, please provide a meaningful 
description of your changes.

- [ ] General
  - The pull request references the related JIRA issue ("[FLINK-XXX] Jira 
title text")
  - The pull request addresses only one issue
  - Each commit in the PR has a meaningful commit message (including the 
JIRA id)

- [ ] Documentation
  - Documentation has been added for new functionality
  - Old documentation affected by the pull request has been updated
  - JavaDoc for public methods has been added

- [ ] Tests & Build
  - Functionality added by the pull request is covered by tests
  - `mvn clean verify` has been executed successfully locally or a Travis 
build has passed


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/RalphSu/flink storm-metrics

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/3615.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3615


commit f050988e71902f3af59d0d8c6a94abeaf2e91f9b
Author: Ralph, Su 
Date:   2017-03-26T01:36:12Z

support flink-storm metrics




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

61 matches

Mail list logo