[GitHub] aljoscha commented on issue #6784: [FLINK-7811] Add support for Scala 2.12

2018-10-21 Thread GitBox
aljoscha commented on issue #6784: [FLINK-7811] Add support for Scala 2.12
URL: https://github.com/apache/flink/pull/6784#issuecomment-431645254
 
 
   Also, we won't merge it like this but have to think about how to integrate 
the Scala 2.12 build. This PR only builds Scala 2.12, for testing, without 
building Scala 2.11.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (FLINK-7811) Add support for Scala 2.12

2018-10-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-7811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16658116#comment-16658116
 ] 

ASF GitHub Bot commented on FLINK-7811:
---

aljoscha commented on issue #6784: [FLINK-7811] Add support for Scala 2.12
URL: https://github.com/apache/flink/pull/6784#issuecomment-431645254
 
 
   Also, we won't merge it like this but have to think about how to integrate 
the Scala 2.12 build. This PR only builds Scala 2.12, for testing, without 
building Scala 2.11.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for Scala 2.12
> --
>
> Key: FLINK-7811
> URL: https://issues.apache.org/jira/browse/FLINK-7811
> Project: Flink
>  Issue Type: Sub-task
>  Components: Scala API
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.7.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-10599) Provide documentation for the modern kafka connector

2018-10-21 Thread vinoyang (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated FLINK-10599:
-
Component/s: Documentation

> Provide documentation for the modern kafka connector
> 
>
> Key: FLINK-10599
> URL: https://issues.apache.org/jira/browse/FLINK-10599
> Project: Flink
>  Issue Type: Sub-task
>  Components: Documentation, Kafka Connector
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
> Fix For: 1.7.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10609) Source distribution tests fails with logging ClassCastException

2018-10-21 Thread Till Rohrmann (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16658166#comment-16658166
 ] 

Till Rohrmann commented on FLINK-10609:
---

This should also apply to the current master, right [~Zentol]?

> Source distribution tests fails with logging ClassCastException
> ---
>
> Key: FLINK-10609
> URL: https://issues.apache.org/jira/browse/FLINK-10609
> Project: Flink
>  Issue Type: Bug
>  Components: Build System, Tests
>Affects Versions: 1.5.5, 1.6.2
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Major
> Fix For: 1.5.5, 1.6.2
>
>
> [~fhueske] and [~dawidwys] have reported a test error while running the tests 
> for the source distribution of {{1.5.5-rc1}} and {{1.6.2-rc1}}.
> 1.6:
> {code:java}
> ERROR StatusLogger Unable to create class
> org.apache.flink.streaming.connectors.elasticsearch5.shaded.org.apache.logging.slf4j.SLF4JLoggerContextFactory
> specified in
> file:/home/fhueske/Downloads/flink-1.6.2/flink-connectors/flink-connector-elasticsearch5/target/classes/META-INF/log4j-provider.properties
>   java.lang.ClassNotFoundException:
> org.apache.flink.streaming.connectors.elasticsearch5.shaded.org.apache.logging.slf4j.SLF4JLoggerContextFactory
>  at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>  at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>  at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
>  at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>  at
> org.apache.logging.log4j.spi.Provider.loadLoggerContextFactory(Provider.java:96)
>  at org.apache.logging.log4j.LogManager.(LogManager.java:91)
>  at
> org.elasticsearch.common.logging.ESLoggerFactory.getLogger(ESLoggerFactory.java:49)
>  at org.elasticsearch.common.logging.Loggers.getLogger(Loggers.java:105)
>  at org.elasticsearch.node.Node.(Node.java:237)
>  at
> org.apache.flink.streaming.connectors.elasticsearch.EmbeddedElasticsearchNodeEnvironmentImpl$PluginNode.(EmbeddedElasticsearchNodeEnvironmentImpl.java:78)
>  at
> org.apache.flink.streaming.connectors.elasticsearch.EmbeddedElasticsearchNodeEnvironmentImpl.start(EmbeddedElasticsearchNodeEnvironmentImpl.java:54)
>  at
> org.apache.flink.streaming.connectors.elasticsearch.ElasticsearchSinkTestBase.prepare(ElasticsearchSinkTestBase.java:72)
>  
> {code}
> 1.5:
> {code:java}
> Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; 
> support was removed in 8.0
> Running org.apache.flink.addons.hbase.HBaseConnectorITCase
> Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 0.557 sec <<< 
> FAILURE! - in org.apache.flink.addons.hbase.HBaseConnectorITCase
> org.apache.flink.addons.hbase.HBaseConnectorITCase  Time elapsed: 0.556 sec  
> <<< ERROR!
> java.lang.ClassCastException: 
> org.apache.commons.logging.impl.SLF4JLocationAwareLog cannot be cast to 
> org.apache.commons.logging.impl.Log4JLogger
> org.apache.flink.addons.hbase.HBaseConnectorITCase  Time elapsed: 0.557 sec  
> <<< ERROR!
> java.lang.NullPointerException{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] yanghua opened a new pull request #6889: [FLINK-10599][Documentation] Provide documentation for the modern kafka connector

2018-10-21 Thread GitBox
yanghua opened a new pull request #6889: [FLINK-10599][Documentation] Provide 
documentation for the modern kafka connector
URL: https://github.com/apache/flink/pull/6889
 
 
   ## What is the purpose of the change
   
   *This pull request provides documentation for the modern kafka connector*
   
   
   ## Brief change log
   
 - *Provide documentation for the modern kafka connector*
   
   
   ## Verifying this change
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (yes / **no**)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / **no**)
 - The serializers: (yes / **no** / don't know)
 - The runtime per-record code paths (performance sensitive): (yes / **no** 
/ don't know)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / **no** / don't know)
 - The S3 file system connector: (yes / **no** / don't know)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes / **no**)
 - If yes, how is the feature documented? (not applicable / **docs** / 
JavaDocs / not documented)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (FLINK-10599) Provide documentation for the modern kafka connector

2018-10-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16658169#comment-16658169
 ] 

ASF GitHub Bot commented on FLINK-10599:


yanghua opened a new pull request #6889: [FLINK-10599][Documentation] Provide 
documentation for the modern kafka connector
URL: https://github.com/apache/flink/pull/6889
 
 
   ## What is the purpose of the change
   
   *This pull request provides documentation for the modern kafka connector*
   
   
   ## Brief change log
   
 - *Provide documentation for the modern kafka connector*
   
   
   ## Verifying this change
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (yes / **no**)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / **no**)
 - The serializers: (yes / **no** / don't know)
 - The runtime per-record code paths (performance sensitive): (yes / **no** 
/ don't know)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / **no** / don't know)
 - The S3 file system connector: (yes / **no** / don't know)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes / **no**)
 - If yes, how is the feature documented? (not applicable / **docs** / 
JavaDocs / not documented)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Provide documentation for the modern kafka connector
> 
>
> Key: FLINK-10599
> URL: https://issues.apache.org/jira/browse/FLINK-10599
> Project: Flink
>  Issue Type: Sub-task
>  Components: Documentation, Kafka Connector
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.7.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-10599) Provide documentation for the modern kafka connector

2018-10-21 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-10599:
---
Labels: pull-request-available  (was: )

> Provide documentation for the modern kafka connector
> 
>
> Key: FLINK-10599
> URL: https://issues.apache.org/jira/browse/FLINK-10599
> Project: Flink
>  Issue Type: Sub-task
>  Components: Documentation, Kafka Connector
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.7.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] yanghua commented on issue #6889: [FLINK-10599][Documentation] Provide documentation for the modern kafka connector

2018-10-21 Thread GitBox
yanghua commented on issue #6889: [FLINK-10599][Documentation] Provide 
documentation for the modern kafka connector
URL: https://github.com/apache/flink/pull/6889#issuecomment-431659542
 
 
   Hi @aljoscha I provided a simple introduction and documentation for the 
modern kafka connector. Considering that I am not a native English user, you 
are welcome to point out where the documentation is inappropriate. Also, if it 
is missing something, pelase feel free to tell me. Thank you.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (FLINK-10599) Provide documentation for the modern kafka connector

2018-10-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16658171#comment-16658171
 ] 

ASF GitHub Bot commented on FLINK-10599:


yanghua commented on issue #6889: [FLINK-10599][Documentation] Provide 
documentation for the modern kafka connector
URL: https://github.com/apache/flink/pull/6889#issuecomment-431659542
 
 
   Hi @aljoscha I provided a simple introduction and documentation for the 
modern kafka connector. Considering that I am not a native English user, you 
are welcome to point out where the documentation is inappropriate. Also, if it 
is missing something, pelase feel free to tell me. Thank you.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Provide documentation for the modern kafka connector
> 
>
> Key: FLINK-10599
> URL: https://issues.apache.org/jira/browse/FLINK-10599
> Project: Flink
>  Issue Type: Sub-task
>  Components: Documentation, Kafka Connector
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.7.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10603) Reduce kafka test duration

2018-10-21 Thread vinoyang (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16658177#comment-16658177
 ] 

vinoyang commented on FLINK-10603:
--

The reason for this issue is due to the increase in the timeout of 
testFailOnNoBroker (from 60 seconds to 120 seconds). The timeout for extending 
the method was originally because it needed access to the Kafka client API: 
partitionsFor. In kafka client 2.0 its default timeout is 60s.

According to the official description of the relevant configuration, the 
timeout period is determined by the configuration item (max.block.ms). So, we 
tried to reduce the configuration to 30s.

> Reduce kafka test duration
> --
>
> Key: FLINK-10603
> URL: https://issues.apache.org/jira/browse/FLINK-10603
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kafka Connector, Tests
>Affects Versions: 1.7.0
>Reporter: Chesnay Schepler
>Assignee: vinoyang
>Priority: Major
>
> The tests for the modern kafka connector take more than 10 minutes which is 
> simply unacceptable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (FLINK-10603) Reduce kafka test duration

2018-10-21 Thread vinoyang (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16658177#comment-16658177
 ] 

vinoyang edited comment on FLINK-10603 at 10/21/18 11:33 AM:
-

The reason for this issue is due to the increase in the timeout of 
testFailOnNoBroker (from 60 seconds to 120 seconds). The timeout for extending 
the method was originally because it needed access to the Kafka client API: 
partitionsFor. In kafka client 2.0 its default timeout is 60s.

According to the official description of the relevant configuration, the 
timeout period is determined by the configuration item 
([#max.block.ms]https://kafka.apache.org/documentation/). So, we tried to 
reduce the configuration to 30s.


was (Author: yanghua):
The reason for this issue is due to the increase in the timeout of 
testFailOnNoBroker (from 60 seconds to 120 seconds). The timeout for extending 
the method was originally because it needed access to the Kafka client API: 
partitionsFor. In kafka client 2.0 its default timeout is 60s.

According to the official description of the relevant configuration, the 
timeout period is determined by the configuration item (max.block.ms). So, we 
tried to reduce the configuration to 30s.

> Reduce kafka test duration
> --
>
> Key: FLINK-10603
> URL: https://issues.apache.org/jira/browse/FLINK-10603
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kafka Connector, Tests
>Affects Versions: 1.7.0
>Reporter: Chesnay Schepler
>Assignee: vinoyang
>Priority: Major
>
> The tests for the modern kafka connector take more than 10 minutes which is 
> simply unacceptable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (FLINK-10603) Reduce kafka test duration

2018-10-21 Thread vinoyang (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16658177#comment-16658177
 ] 

vinoyang edited comment on FLINK-10603 at 10/21/18 11:34 AM:
-

The reason for this issue is due to the increase in the timeout of 
testFailOnNoBroker (from 60 seconds to 120 seconds). The timeout for extending 
the method was originally because it needed access to the Kafka client API: 
partitionsFor. In kafka client 2.0 its default timeout is 60s.

According to the official description of the relevant configuration, the 
timeout period is determined by the configuration item 
([max.block.ms|https://kafka.apache.org/documentation/]). So, we tried to 
reduce the configuration to 30s.


was (Author: yanghua):
The reason for this issue is due to the increase in the timeout of 
testFailOnNoBroker (from 60 seconds to 120 seconds). The timeout for extending 
the method was originally because it needed access to the Kafka client API: 
partitionsFor. In kafka client 2.0 its default timeout is 60s.

According to the official description of the relevant configuration, the 
timeout period is determined by the configuration item 
([#max.block.ms]https://kafka.apache.org/documentation/). So, we tried to 
reduce the configuration to 30s.

> Reduce kafka test duration
> --
>
> Key: FLINK-10603
> URL: https://issues.apache.org/jira/browse/FLINK-10603
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kafka Connector, Tests
>Affects Versions: 1.7.0
>Reporter: Chesnay Schepler
>Assignee: vinoyang
>Priority: Major
>
> The tests for the modern kafka connector take more than 10 minutes which is 
> simply unacceptable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-10603) Reduce kafka test duration

2018-10-21 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-10603:
---
Labels: pull-request-available  (was: )

> Reduce kafka test duration
> --
>
> Key: FLINK-10603
> URL: https://issues.apache.org/jira/browse/FLINK-10603
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kafka Connector, Tests
>Affects Versions: 1.7.0
>Reporter: Chesnay Schepler
>Assignee: vinoyang
>Priority: Major
>  Labels: pull-request-available
>
> The tests for the modern kafka connector take more than 10 minutes which is 
> simply unacceptable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] yanghua opened a new pull request #6890: [FLINK-10603] Reduce kafka test duration

2018-10-21 Thread GitBox
yanghua opened a new pull request #6890: [FLINK-10603] Reduce kafka test 
duration
URL: https://github.com/apache/flink/pull/6890
 
 
   ## What is the purpose of the change
   
   *This pull request reduces kafka test duration*
   
   
   ## Brief change log
   
 - *Reduce kafka test duration*
   
   
   ## Verifying this change
   
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (yes / **no**)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / **no**)
 - The serializers: (yes / **no** / don't know)
 - The runtime per-record code paths (performance sensitive): (yes / **no** 
/ don't know)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / **no** / don't know)
 - The S3 file system connector: (yes / **no** / don't know)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes / **no**)
 - If yes, how is the feature documented? (not applicable / docs / JavaDocs 
/ **not documented**)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (FLINK-10603) Reduce kafka test duration

2018-10-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16658180#comment-16658180
 ] 

ASF GitHub Bot commented on FLINK-10603:


yanghua opened a new pull request #6890: [FLINK-10603] Reduce kafka test 
duration
URL: https://github.com/apache/flink/pull/6890
 
 
   ## What is the purpose of the change
   
   *This pull request reduces kafka test duration*
   
   
   ## Brief change log
   
 - *Reduce kafka test duration*
   
   
   ## Verifying this change
   
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (yes / **no**)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / **no**)
 - The serializers: (yes / **no** / don't know)
 - The runtime per-record code paths (performance sensitive): (yes / **no** 
/ don't know)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / **no** / don't know)
 - The S3 file system connector: (yes / **no** / don't know)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes / **no**)
 - If yes, how is the feature documented? (not applicable / docs / JavaDocs 
/ **not documented**)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Reduce kafka test duration
> --
>
> Key: FLINK-10603
> URL: https://issues.apache.org/jira/browse/FLINK-10603
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kafka Connector, Tests
>Affects Versions: 1.7.0
>Reporter: Chesnay Schepler
>Assignee: vinoyang
>Priority: Major
>  Labels: pull-request-available
>
> The tests for the modern kafka connector take more than 10 minutes which is 
> simply unacceptable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-10622) Add end-to-end test for time versioned joins

2018-10-21 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-10622:
-

 Summary: Add end-to-end test for time versioned joins
 Key: FLINK-10622
 URL: https://issues.apache.org/jira/browse/FLINK-10622
 Project: Flink
  Issue Type: Sub-task
  Components: Table API & SQL, Tests
Affects Versions: 1.7.0
Reporter: Till Rohrmann
 Fix For: 1.7.0


We should add an end-to-end test to test the new time versioned joins 
functionality. Maybe we can add this to the existing Stream SQL end-to-end test 
{{test_streaming_sql.sh}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-10623) Extend streaming SQL end-to-end test to test MATCH_RECOGNIZE

2018-10-21 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-10623:
-

 Summary: Extend streaming SQL end-to-end test to test 
MATCH_RECOGNIZE
 Key: FLINK-10623
 URL: https://issues.apache.org/jira/browse/FLINK-10623
 Project: Flink
  Issue Type: Sub-task
  Components: Table API & SQL, Tests
Affects Versions: 1.7.0
Reporter: Till Rohrmann
 Fix For: 1.7.0


We should extend the existing {{test_streaming_sql.sh}} to test the newly added 
{{MATCH_RECOGNIZE}} functionality.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-10624) Extend SQL client end-to-end to test new KafkaTableSink

2018-10-21 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-10624:
-

 Summary: Extend SQL client end-to-end to test new KafkaTableSink
 Key: FLINK-10624
 URL: https://issues.apache.org/jira/browse/FLINK-10624
 Project: Flink
  Issue Type: Task
  Components: Kafka Connector, Table API & SQL, Tests
Affects Versions: 1.7.0
Reporter: Till Rohrmann
 Fix For: 1.7.0


With FLINK-9697, we added support for Kafka 2.0. We should also extend the 
existing streaming client end-to-end test to also test the new 
{{KafkaTableSink}} against Kafka 2.0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-10625) Add MATCH_RECOGNIZE documentation

2018-10-21 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-10625:
-

 Summary: Add MATCH_RECOGNIZE documentation
 Key: FLINK-10625
 URL: https://issues.apache.org/jira/browse/FLINK-10625
 Project: Flink
  Issue Type: Sub-task
  Components: Documentation, Table API & SQL
Affects Versions: 1.7.0
Reporter: Till Rohrmann
 Fix For: 1.7.0


The newly added {{MATCH_RECOGNIZE}} functionality needs to be documented.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-10626) Add documentation for time versioned joins

2018-10-21 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-10626:
-

 Summary: Add documentation for time versioned joins
 Key: FLINK-10626
 URL: https://issues.apache.org/jira/browse/FLINK-10626
 Project: Flink
  Issue Type: Sub-task
  Components: Documentation, Table API & SQL
Affects Versions: 1.7.0
Reporter: Till Rohrmann
 Fix For: 1.7.0


The Flink documentation should be updated to cover the newly added 
functionality.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-10627) Extend test_streaming_file_sink to test S3 writer

2018-10-21 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-10627:
-

 Summary: Extend test_streaming_file_sink to test S3 writer
 Key: FLINK-10627
 URL: https://issues.apache.org/jira/browse/FLINK-10627
 Project: Flink
  Issue Type: Sub-task
  Components: filesystem-connector, Tests
Affects Versions: 1.7.0
Reporter: Till Rohrmann
 Fix For: 1.7.0


We should extend the {{test_streaming_file_sink.sh}} to test the new S3 
recoverable writer in an integrated way.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-10627) Extend test_streaming_file_sink to test S3 writer

2018-10-21 Thread Till Rohrmann (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated FLINK-10627:
--
Description: 
We should extend the {{test_streaming_file_sink.sh}} to test the new S3 
recoverable writer in an integrated way.

The test should cover the fault recovery. Moreover it should check for resource 
cleanup.

  was:We should extend the {{test_streaming_file_sink.sh}} to test the new S3 
recoverable writer in an integrated way.


> Extend test_streaming_file_sink to test S3 writer
> -
>
> Key: FLINK-10627
> URL: https://issues.apache.org/jira/browse/FLINK-10627
> Project: Flink
>  Issue Type: Sub-task
>  Components: filesystem-connector, Tests
>Affects Versions: 1.7.0
>Reporter: Till Rohrmann
>Priority: Critical
> Fix For: 1.7.0
>
>
> We should extend the {{test_streaming_file_sink.sh}} to test the new S3 
> recoverable writer in an integrated way.
> The test should cover the fault recovery. Moreover it should check for 
> resource cleanup.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-10628) Add end-to-end test for REST communication encryption

2018-10-21 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-10628:
-

 Summary: Add end-to-end test for REST communication encryption
 Key: FLINK-10628
 URL: https://issues.apache.org/jira/browse/FLINK-10628
 Project: Flink
  Issue Type: Sub-task
  Components: REST
Affects Versions: 1.7.0
Reporter: Till Rohrmann
 Fix For: 1.7.0


With FLINK-10371 we added support for mutual authentication for the REST 
communication. We should adapt one of the existing end-to-end tests to require 
this feature for the REST communication.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-2485) Handle removal of Java Unsafe in Java9

2018-10-21 Thread Benoit MERIAUX (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-2485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16658231#comment-16658231
 ] 

Benoit MERIAUX commented on FLINK-2485:
---

Hi,
I did a quick search on usage of {code:java}sun.misc.unsafe{code} and there is 
actually 39 occurences, mainly in core module which concentrates 34 hits, and 
notably memory package with 26 occurences.

there is a replacement solution with
{code:java}VarHandle{code}
starting from jdk9, so it is not retrocompatible with < jdk9

Unsafe are still included in jdk 9 and 10, and removed from 11.

So this issue is not on the critical path to build with jdk 9 and 10.

> Handle removal of Java Unsafe in Java9
> --
>
> Key: FLINK-2485
> URL: https://issues.apache.org/jira/browse/FLINK-2485
> Project: Flink
>  Issue Type: Task
>  Components: Core
>Reporter: Henry Saputra
>Priority: Major
>
> With potential Oracle will remove Java Unsafe[1]  from Java9 we need to make 
> sure we have upgrade path for Apache Flink
> [1] 
> https://docs.google.com/document/d/1GDm_cAxYInmoHMor-AkStzWvwE9pw6tnz_CebJQxuUE/edit?pli=1



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-2485) Handle removal of Java Unsafe in Java9

2018-10-21 Thread Henry Saputra (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-2485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16658251#comment-16658251
 ] 

Henry Saputra commented on FLINK-2485:
--

Agree. Will reduce the importance of this issue

> Handle removal of Java Unsafe in Java9
> --
>
> Key: FLINK-2485
> URL: https://issues.apache.org/jira/browse/FLINK-2485
> Project: Flink
>  Issue Type: Task
>  Components: Core
>Reporter: Henry Saputra
>Priority: Major
>
> With potential Oracle will remove Java Unsafe[1]  from Java9 we need to make 
> sure we have upgrade path for Apache Flink
> [1] 
> https://docs.google.com/document/d/1GDm_cAxYInmoHMor-AkStzWvwE9pw6tnz_CebJQxuUE/edit?pli=1



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-2485) Handle removal of Java Unsafe

2018-10-21 Thread Henry Saputra (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-2485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Saputra updated FLINK-2485:
-
   Priority: Minor  (was: Major)
Description: 
With potential Oracle will remove Java Unsafe[1] from future Java we need to 
make sure we have upgrade path for Apache Flink

[1] 
[https://docs.google.com/document/d/1GDm_cAxYInmoHMor-AkStzWvwE9pw6tnz_CebJQxuUE/edit?pli=1]

  was:
With potential Oracle will remove Java Unsafe[1]  from Java9 we need to make 
sure we have upgrade path for Apache Flink

[1] 
https://docs.google.com/document/d/1GDm_cAxYInmoHMor-AkStzWvwE9pw6tnz_CebJQxuUE/edit?pli=1

Summary: Handle removal of Java Unsafe  (was: Handle removal of Java 
Unsafe in Java9)

> Handle removal of Java Unsafe
> -
>
> Key: FLINK-2485
> URL: https://issues.apache.org/jira/browse/FLINK-2485
> Project: Flink
>  Issue Type: Task
>  Components: Core
>Reporter: Henry Saputra
>Priority: Minor
>
> With potential Oracle will remove Java Unsafe[1] from future Java we need to 
> make sure we have upgrade path for Apache Flink
> [1] 
> [https://docs.google.com/document/d/1GDm_cAxYInmoHMor-AkStzWvwE9pw6tnz_CebJQxuUE/edit?pli=1]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] tillrohrmann commented on issue #6876: [FLINK-10251] Handle oversized response messages in AkkaRpcActor

2018-10-21 Thread GitBox
tillrohrmann commented on issue #6876: [FLINK-10251] Handle oversized response 
messages in AkkaRpcActor
URL: https://github.com/apache/flink/pull/6876#issuecomment-431698563
 
 
   The problem is that you always try to serialize the result. However, it 
should only be serialized if the sender is remote.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (FLINK-10251) Handle oversized response messages in AkkaRpcActor

2018-10-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16658356#comment-16658356
 ] 

ASF GitHub Bot commented on FLINK-10251:


tillrohrmann commented on issue #6876: [FLINK-10251] Handle oversized response 
messages in AkkaRpcActor
URL: https://github.com/apache/flink/pull/6876#issuecomment-431698563
 
 
   The problem is that you always try to serialize the result. However, it 
should only be serialized if the sender is remote.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Handle oversized response messages in AkkaRpcActor
> --
>
> Key: FLINK-10251
> URL: https://issues.apache.org/jira/browse/FLINK-10251
> Project: Flink
>  Issue Type: Improvement
>  Components: Distributed Coordination
>Affects Versions: 1.5.3, 1.6.0, 1.7.0
>Reporter: Till Rohrmann
>Assignee: vinoyang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.5.6, 1.6.3, 1.7.0
>
>
> The {{AkkaRpcActor}} should check whether an RPC response which is sent to a 
> remote sender does not exceed the maximum framesize of the underlying 
> {{ActorSystem}}. If this is the case we should fail fast instead. We can 
> achieve this by serializing the response and sending the serialized byte 
> array.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10558) Port YARNHighAvailabilityITCase and YARNSessionCapacitySchedulerITCase to new code base

2018-10-21 Thread Till Rohrmann (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16658357#comment-16658357
 ] 

Till Rohrmann commented on FLINK-10558:
---

I think in the future we might add a minimum number of running TMs to the 
session cluster. This number will make sure that at least so many TMs are 
always running. If we have this feature, then we don't need to submit a job to 
make the test work.

> Port YARNHighAvailabilityITCase and YARNSessionCapacitySchedulerITCase to new 
> code base
> ---
>
> Key: FLINK-10558
> URL: https://issues.apache.org/jira/browse/FLINK-10558
> Project: Flink
>  Issue Type: Sub-task
>Reporter: vinoyang
>Assignee: TisonKun
>Priority: Minor
>
> {{YARNHighAvailabilityITCase}}, 
> {{YARNSessionCapacitySchedulerITCase#testClientStartup,}} 
> {{YARNSessionCapacitySchedulerITCase#testTaskManagerFailure}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10527) Cleanup constant isNewMode in YarnTestBase

2018-10-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16658358#comment-16658358
 ] 

ASF GitHub Bot commented on FLINK-10527:


tillrohrmann closed pull request #6816: [FLINK-10527] Cleanup constant 
isNewMode in YarnTestBase
URL: https://github.com/apache/flink/pull/6816
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/flink-yarn-tests/src/test/java/org/apache/flink/yarn/YARNSessionFIFOITCase.java
 
b/flink-yarn-tests/src/test/java/org/apache/flink/yarn/YARNSessionFIFOITCase.java
index f027399be7f..fb2c6ccf2b0 100644
--- 
a/flink-yarn-tests/src/test/java/org/apache/flink/yarn/YARNSessionFIFOITCase.java
+++ 
b/flink-yarn-tests/src/test/java/org/apache/flink/yarn/YARNSessionFIFOITCase.java
@@ -132,28 +132,6 @@ public void testDetachedMode() throws 
InterruptedException, IOException {
// before checking any strings outputted by the CLI, first give 
it time to return
clusterRunner.join();
 
-   if (!isNewMode) {
-   checkForLogString("The Flink YARN client has been 
started in detached mode");
-
-   // in legacy mode we have to wait until the TMs are up 
until we can submit the job
-   LOG.info("Waiting until two containers are running");
-   // wait until two containers are running
-   while (getRunningContainers() < 2) {
-   sleep(500);
-   }
-
-   // additional sleep for the JM/TM to start and 
establish connection
-   long startTime = System.nanoTime();
-   while (System.nanoTime() - startTime < 
TimeUnit.NANOSECONDS.convert(10, TimeUnit.SECONDS) &&
-   !(verifyStringsInNamedLogFiles(
-   new String[]{"YARN Application Master 
started"}, "jobmanager.log") &&
-   verifyStringsInNamedLogFiles(
-   new String[]{"Starting 
TaskManager actor"}, "taskmanager.log"))) {
-   LOG.info("Still waiting for JM/TM to 
initialize...");
-   sleep(500);
-   }
-   }
-
// actually run a program, otherwise we wouldn't necessarily 
see any TaskManagers
// be brought up
Runner jobRunner = startWithArgs(new String[]{"run",
@@ -163,14 +141,12 @@ public void testDetachedMode() throws 
InterruptedException, IOException {
 
jobRunner.join();
 
-   if (isNewMode) {
-   // in "new" mode we can only wait after the job is 
submitted, because TMs
-   // are spun up lazily
-   LOG.info("Waiting until two containers are running");
-   // wait until two containers are running
-   while (getRunningContainers() < 2) {
-   sleep(500);
-   }
+   // in "new" mode we can only wait after the job is submitted, 
because TMs
+   // are spun up lazily
+   LOG.info("Waiting until two containers are running");
+   // wait until two containers are running
+   while (getRunningContainers() < 2) {
+   sleep(500);
}
 
// make sure we have two TMs running in either mode


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Cleanup constant isNewMode in YarnTestBase
> --
>
> Key: FLINK-10527
> URL: https://issues.apache.org/jira/browse/FLINK-10527
> Project: Flink
>  Issue Type: Sub-task
>  Components: YARN
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>  Labels: pull-request-available
>
> This seems to be a residual problem with FLINK-10396. It is set to true in 
> that PR. Currently it has three usage scenarios:
> 1. assert, caused an error
> {code:java}
> assumeTrue("The new mode does not start TMs upfront.", !isNewMode);
> {code}
> 2. if (!isNewMode) the logic in the block would not have invoked, the if 
> block can be removed
> 3. if (isNewMode) always been invok

[jira] [Created] (FLINK-10629) FlinkKafkaProducerITCase.testFlinkKafkaProducer10FailTransactionCoordinatorBeforeNotify failed on Travis

2018-10-21 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-10629:
-

 Summary: 
FlinkKafkaProducerITCase.testFlinkKafkaProducer10FailTransactionCoordinatorBeforeNotify
 failed on Travis
 Key: FLINK-10629
 URL: https://issues.apache.org/jira/browse/FLINK-10629
 Project: Flink
  Issue Type: Bug
  Components: Kafka Connector, Tests
Affects Versions: 1.7.0
Reporter: Till Rohrmann
 Fix For: 1.7.0


The 
{{FlinkKafkaProducerITCase.testFlinkKafkaProducer10FailTransactionCoordinatorBeforeNotify}}
 failed on Travis.

https://api.travis-ci.org/v3/job/443777257/log.txt



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] tillrohrmann closed pull request #6816: [FLINK-10527] Cleanup constant isNewMode in YarnTestBase

2018-10-21 Thread GitBox
tillrohrmann closed pull request #6816: [FLINK-10527] Cleanup constant 
isNewMode in YarnTestBase
URL: https://github.com/apache/flink/pull/6816
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/flink-yarn-tests/src/test/java/org/apache/flink/yarn/YARNSessionFIFOITCase.java
 
b/flink-yarn-tests/src/test/java/org/apache/flink/yarn/YARNSessionFIFOITCase.java
index f027399be7f..fb2c6ccf2b0 100644
--- 
a/flink-yarn-tests/src/test/java/org/apache/flink/yarn/YARNSessionFIFOITCase.java
+++ 
b/flink-yarn-tests/src/test/java/org/apache/flink/yarn/YARNSessionFIFOITCase.java
@@ -132,28 +132,6 @@ public void testDetachedMode() throws 
InterruptedException, IOException {
// before checking any strings outputted by the CLI, first give 
it time to return
clusterRunner.join();
 
-   if (!isNewMode) {
-   checkForLogString("The Flink YARN client has been 
started in detached mode");
-
-   // in legacy mode we have to wait until the TMs are up 
until we can submit the job
-   LOG.info("Waiting until two containers are running");
-   // wait until two containers are running
-   while (getRunningContainers() < 2) {
-   sleep(500);
-   }
-
-   // additional sleep for the JM/TM to start and 
establish connection
-   long startTime = System.nanoTime();
-   while (System.nanoTime() - startTime < 
TimeUnit.NANOSECONDS.convert(10, TimeUnit.SECONDS) &&
-   !(verifyStringsInNamedLogFiles(
-   new String[]{"YARN Application Master 
started"}, "jobmanager.log") &&
-   verifyStringsInNamedLogFiles(
-   new String[]{"Starting 
TaskManager actor"}, "taskmanager.log"))) {
-   LOG.info("Still waiting for JM/TM to 
initialize...");
-   sleep(500);
-   }
-   }
-
// actually run a program, otherwise we wouldn't necessarily 
see any TaskManagers
// be brought up
Runner jobRunner = startWithArgs(new String[]{"run",
@@ -163,14 +141,12 @@ public void testDetachedMode() throws 
InterruptedException, IOException {
 
jobRunner.join();
 
-   if (isNewMode) {
-   // in "new" mode we can only wait after the job is 
submitted, because TMs
-   // are spun up lazily
-   LOG.info("Waiting until two containers are running");
-   // wait until two containers are running
-   while (getRunningContainers() < 2) {
-   sleep(500);
-   }
+   // in "new" mode we can only wait after the job is submitted, 
because TMs
+   // are spun up lazily
+   LOG.info("Waiting until two containers are running");
+   // wait until two containers are running
+   while (getRunningContainers() < 2) {
+   sleep(500);
}
 
// make sure we have two TMs running in either mode


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Resolved] (FLINK-10527) Cleanup constant isNewMode in YarnTestBase

2018-10-21 Thread Till Rohrmann (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann resolved FLINK-10527.
---
   Resolution: Fixed
Fix Version/s: 1.7.0

Fixed via 
https://github.com/apache/flink/commit/7b040b915504e59243c642b1f4a84c956d96d134

> Cleanup constant isNewMode in YarnTestBase
> --
>
> Key: FLINK-10527
> URL: https://issues.apache.org/jira/browse/FLINK-10527
> Project: Flink
>  Issue Type: Sub-task
>  Components: YARN
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.7.0
>
>
> This seems to be a residual problem with FLINK-10396. It is set to true in 
> that PR. Currently it has three usage scenarios:
> 1. assert, caused an error
> {code:java}
> assumeTrue("The new mode does not start TMs upfront.", !isNewMode);
> {code}
> 2. if (!isNewMode) the logic in the block would not have invoked, the if 
> block can be removed
> 3. if (isNewMode) always been invoked, the if statement can be removed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-10630) Update migration tests for Flink 1.6

2018-10-21 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-10630:
-

 Summary: Update migration tests for Flink 1.6
 Key: FLINK-10630
 URL: https://issues.apache.org/jira/browse/FLINK-10630
 Project: Flink
  Issue Type: Task
  Components: Tests
Affects Versions: 1.7.0
Reporter: Till Rohrmann
 Fix For: 1.7.0


Similar to FLINK-10084, we need to update the migration tests for the latest 
Flink version.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-10631) Update jepsen tests to run with multiple slots

2018-10-21 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-10631:
-

 Summary: Update jepsen tests to run with multiple slots
 Key: FLINK-10631
 URL: https://issues.apache.org/jira/browse/FLINK-10631
 Project: Flink
  Issue Type: Task
  Components: Tests
Affects Versions: 1.7.0
Reporter: Till Rohrmann
 Fix For: 1.7.0


After fixing FLINK-9455, we should update the Jepsen tests to run with multiple 
slots per {{TaskExecutor}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-10632) Run general purpose test job with failures in per-job mode

2018-10-21 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-10632:
-

 Summary: Run general purpose test job with failures in per-job mode
 Key: FLINK-10632
 URL: https://issues.apache.org/jira/browse/FLINK-10632
 Project: Flink
  Issue Type: Sub-task
  Components: E2E Tests
Affects Versions: 1.7.0
Reporter: Till Rohrmann
 Fix For: 1.7.0


Similar to FLINK-8973, we should add an end-to-end which runs the general 
datastream job with failures on a per-job cluster with HA enabled (either 
directly the {{StandaloneJobClusterEntrypoint}} or a docker image based on this 
entrypoint).

We should kill the TMs as well as the cluster entrypoint and verify that the 
job recovers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-10633) End-to-end test: Prometheus reporter

2018-10-21 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-10633:
-

 Summary: End-to-end test: Prometheus reporter
 Key: FLINK-10633
 URL: https://issues.apache.org/jira/browse/FLINK-10633
 Project: Flink
  Issue Type: Sub-task
  Components: E2E Tests
Affects Versions: 1.7.0
Reporter: Till Rohrmann
 Fix For: 1.7.0


Add an end-to-end test using the {{PrometheusReporter}} to verify that all 
metrics are properly reported. Additionally verify that the newly introduce 
RocksDB metrics are accessible (see FLINK-10423).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-10634) End-to-end test: Metrics accessible via REST API

2018-10-21 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-10634:
-

 Summary: End-to-end test: Metrics accessible via REST API
 Key: FLINK-10634
 URL: https://issues.apache.org/jira/browse/FLINK-10634
 Project: Flink
  Issue Type: Sub-task
  Components: E2E Tests, Metrics, REST
Affects Versions: 1.7.0
Reporter: Till Rohrmann
 Fix For: 1.7.0


Verify that Flink's metrics can be accessed via the REST API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-10635) End-to-end test: Take & resume from savepoint when using per-job mode

2018-10-21 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-10635:
-

 Summary: End-to-end test: Take & resume from savepoint when using 
per-job mode
 Key: FLINK-10635
 URL: https://issues.apache.org/jira/browse/FLINK-10635
 Project: Flink
  Issue Type: Sub-task
  Components: E2E Tests, Tests
Affects Versions: 1.7.0
Reporter: Till Rohrmann
 Fix For: 1.7.0


Cancel with savepoint a job running in per-job mode (using the 
{{StandaloneJobClusterEntrypoint}}) and resume from this savepoint. This 
effectively that FLINK-10309 has been properly fixed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (FLINK-10624) Extend SQL client end-to-end to test new KafkaTableSink

2018-10-21 Thread vinoyang (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang reassigned FLINK-10624:


Assignee: vinoyang

> Extend SQL client end-to-end to test new KafkaTableSink
> ---
>
> Key: FLINK-10624
> URL: https://issues.apache.org/jira/browse/FLINK-10624
> Project: Flink
>  Issue Type: Task
>  Components: Kafka Connector, Table API & SQL, Tests
>Affects Versions: 1.7.0
>Reporter: Till Rohrmann
>Assignee: vinoyang
>Priority: Critical
> Fix For: 1.7.0
>
>
> With FLINK-9697, we added support for Kafka 2.0. We should also extend the 
> existing streaming client end-to-end test to also test the new 
> {{KafkaTableSink}} against Kafka 2.0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10624) Extend SQL client end-to-end to test new KafkaTableSink

2018-10-21 Thread vinoyang (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16658464#comment-16658464
 ] 

vinoyang commented on FLINK-10624:
--

I created an umbrella issue FLINK-10598 to archive the issue related to the 
modern Kafka connector. Do you agree that I place this issue under FLINK-10598?

> Extend SQL client end-to-end to test new KafkaTableSink
> ---
>
> Key: FLINK-10624
> URL: https://issues.apache.org/jira/browse/FLINK-10624
> Project: Flink
>  Issue Type: Task
>  Components: Kafka Connector, Table API & SQL, Tests
>Affects Versions: 1.7.0
>Reporter: Till Rohrmann
>Assignee: vinoyang
>Priority: Critical
> Fix For: 1.7.0
>
>
> With FLINK-9697, we added support for Kafka 2.0. We should also extend the 
> existing streaming client end-to-end test to also test the new 
> {{KafkaTableSink}} against Kafka 2.0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (FLINK-10630) Update migration tests for Flink 1.6

2018-10-21 Thread vinoyang (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang reassigned FLINK-10630:


Assignee: vinoyang

> Update migration tests for Flink 1.6
> 
>
> Key: FLINK-10630
> URL: https://issues.apache.org/jira/browse/FLINK-10630
> Project: Flink
>  Issue Type: Task
>  Components: Tests
>Affects Versions: 1.7.0
>Reporter: Till Rohrmann
>Assignee: vinoyang
>Priority: Critical
> Fix For: 1.7.0
>
>
> Similar to FLINK-10084, we need to update the migration tests for the latest 
> Flink version.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10603) Reduce kafka test duration

2018-10-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16658470#comment-16658470
 ] 

ASF GitHub Bot commented on FLINK-10603:


yanghua commented on issue #6890: [FLINK-10603] Reduce kafka test duration
URL: https://github.com/apache/flink/pull/6890#issuecomment-431724328
 
 
   I don't know why it seems that I can't update the PR and trigger Travis 
(China). I don't know how long Github will work, but from the results of my 
local tests, the correct configuration seems to be:
   
   ```
   properties.setProperty("default.api.timeout.ms", "3000"); // 
KafkaProducer#partitionsFor
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Reduce kafka test duration
> --
>
> Key: FLINK-10603
> URL: https://issues.apache.org/jira/browse/FLINK-10603
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kafka Connector, Tests
>Affects Versions: 1.7.0
>Reporter: Chesnay Schepler
>Assignee: vinoyang
>Priority: Major
>  Labels: pull-request-available
>
> The tests for the modern kafka connector take more than 10 minutes which is 
> simply unacceptable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] yanghua commented on issue #6890: [FLINK-10603] Reduce kafka test duration

2018-10-21 Thread GitBox
yanghua commented on issue #6890: [FLINK-10603] Reduce kafka test duration
URL: https://github.com/apache/flink/pull/6890#issuecomment-431724328
 
 
   I don't know why it seems that I can't update the PR and trigger Travis 
(China). I don't know how long Github will work, but from the results of my 
local tests, the correct configuration seems to be:
   
   ```
   properties.setProperty("default.api.timeout.ms", "3000"); // 
KafkaProducer#partitionsFor
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Assigned] (FLINK-10600) Provide End-to-end test cases for modern Kafka connectors

2018-10-21 Thread vinoyang (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang reassigned FLINK-10600:


Assignee: vinoyang

> Provide End-to-end test cases for modern Kafka connectors
> -
>
> Key: FLINK-10600
> URL: https://issues.apache.org/jira/browse/FLINK-10600
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kafka Connector
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Blocker
> Fix For: 1.7.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-10636) ERROR org.apache.flink.shaded.curator.org.apache.curator.ConnectionState - Authentication failed

2018-10-21 Thread dalongliu (JIRA)
dalongliu created FLINK-10636:
-

 Summary: ERROR 
org.apache.flink.shaded.curator.org.apache.curator.ConnectionState  - 
Authentication failed
 Key: FLINK-10636
 URL: https://issues.apache.org/jira/browse/FLINK-10636
 Project: Flink
  Issue Type: Bug
  Components: Kafka Connector
Affects Versions: 1.6.1
Reporter: dalongliu


Hi, all, when I upgrade flink version from 1.3.2 to 1.6.1, the version of 
FlinkKafkaComuser is 0.8. After start my job, encountered the following 
problems:
2018-10-22 09:54:42,987 WARN  
org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn  - SASL 
configuration failed: javax.security.auth.login.LoginException: No JAAS 
configuration section named 'Client' was found in specified JAAS configuration 
file: '/tmp/jaas-8615545579287424864.conf'. Will continue connection to 
Zookeeper server without SASL authentication, if Zookeeper server allows it.
2018-10-22 09:54:42,993 INFO  
org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn  - Opening 
socket connection to server 10.208.75.87/10.208.75.87:2181
2018-10-22 09:54:42,993 INFO  
org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn  - Opening 
socket connection to server 10.208.75.85/10.208.75.85:2181
2018-10-22 09:54:42,993 INFO  
org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn  - Opening 
socket connection to server 10.208.75.85/10.208.75.85:2181
2018-10-22 09:54:42,994 INFO  
org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn  - Opening 
socket connection to server 10.208.75.87/10.208.75.87:2181
2018-10-22 09:54:42,994 ERROR 
org.apache.flink.shaded.curator.org.apache.curator.ConnectionState  - 
Authentication failed
2018-10-22 09:54:42,997 ERROR 
org.apache.flink.shaded.curator.org.apache.curator.ConnectionState  - 
Authentication failed
2018-10-22 09:54:42,994 INFO  
org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn  - Socket 
connection established to 10.208.75.85/10.208.75.85:2181, initiating session



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10491) Deadlock during spilling data in SpillableSubpartition

2018-10-21 Thread zhijiang (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16658496#comment-16658496
 ] 

zhijiang commented on FLINK-10491:
--

[~till.rohrmann], I think it should be covered in FLINK-1.7 if can catch up 
with the freeze deadline.

> Deadlock during spilling data in SpillableSubpartition 
> ---
>
> Key: FLINK-10491
> URL: https://issues.apache.org/jira/browse/FLINK-10491
> Project: Flink
>  Issue Type: Bug
>  Components: Network
>Affects Versions: 1.5.4, 1.6.1
>Reporter: Piotr Nowojski
>Assignee: zhijiang
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.5.6, 1.6.3, 1.7.0
>
>
> Originally reported here: 
> [https://lists.apache.org/thread.html/472c8f4a2711c5e217fadd9a88f8c73670218e7432bb81ba3f5076db@%3Cuser.flink.apache.org%3E]
> Thread dump (from 1.5.3 version) showing two deadlocked threads, because they 
> are taking two locks in different order:
> {noformat}
> Thread-1
> "DataSink (DATA#HadoopFileOutputFormat ) (1/2)@11002" prio=5 tid=0x3e2 nid=NA 
> waiting for monitor entry
> waiting for Map (Key Extractor) (1/10)@9967 to release lock on <0x2dfb> (a 
> java.util.ArrayDeque)
> at 
> org.apache.flink.runtime.io.network.partition.SpillableSubpartition.releaseMemory(SpillableSubpartition.java:223)
> at 
> org.apache.flink.runtime.io.network.partition.ResultPartition.releaseMemory(ResultPartition.java:373)
> at 
> org.apache.flink.runtime.io.network.buffer.LocalBufferPool.setNumBuffers(LocalBufferPool.java:355)
> - locked <0x2dfd> (a java.util.ArrayDeque)
> at 
> org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.redistributeBuffers(NetworkBufferPool.java:402)
> at 
> org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.recycleMemorySegments(NetworkBufferPool.java:203)
> - locked <0x2da5> (a java.lang.Object)
> at 
> org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.recycleMemorySegments(NetworkBufferPool.java:193)
> at 
> org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.returnExclusiveSegments(SingleInputGate.java:318)
> at 
> org.apache.flink.runtime.io.network.partition.consumer.RemoteInputChannel.releaseAllResources(RemoteInputChannel.java:259)
> at 
> org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.getNextBufferOrEvent(SingleInputGate.java:578)
> at 
> org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.pollNextBufferOrEvent(SingleInputGate.java:507)
> at 
> org.apache.flink.runtime.io.network.partition.consumer.UnionInputGate.waitAndGetNextInputGate(UnionInputGate.java:213)
> at 
> org.apache.flink.runtime.io.network.partition.consumer.UnionInputGate.getNextBufferOrEvent(UnionInputGate.java:163)
> at 
> org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:86)
> at 
> org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:47)
> at 
> org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:73)
> at 
> org.apache.flink.runtime.operators.DataSinkTask.invoke(DataSinkTask.java:216)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703)
> at java.lang.Thread.run(Thread.java:745)
> Thread-2
> "Map (Key Extractor) (1/10)@9967" prio=5 tid=0xaab nid=NA waiting for monitor 
> entry
> java.lang.Thread.State: BLOCKED
> blocks DataSink (DATA#HadoopFileOutputFormat ) (1/2)@11002
> waiting for DataSink (DATA#HadoopFileOutputFormat ) (1/2)@11002 to release 
> lock on <0x2dfd> (a java.util.ArrayDeque)
> at 
> org.apache.flink.runtime.io.network.buffer.LocalBufferPool.recycle(LocalBufferPool.java:261)
> at 
> org.apache.flink.runtime.io.network.buffer.NetworkBuffer.deallocate(NetworkBuffer.java:171)
> at 
> org.apache.flink.shaded.netty4.io.netty.buffer.AbstractReferenceCountedByteBuf.release(AbstractReferenceCountedByteBuf.java:106)
> at 
> org.apache.flink.runtime.io.network.buffer.NetworkBuffer.recycleBuffer(NetworkBuffer.java:146)
> at 
> org.apache.flink.runtime.io.network.buffer.BufferConsumer.close(BufferConsumer.java:110)
> at 
> org.apache.flink.runtime.io.network.partition.SpillableSubpartition.spillFinishedBufferConsumers(SpillableSubpartition.java:271)
> at 
> org.apache.flink.runtime.io.network.partition.SpillableSubpartition.add(SpillableSubpartition.java:117)
> - locked <0x2dfb> (a java.util.ArrayDeque)
> at 
> org.apache.flink.runtime.io.network.partition.SpillableSubpartition.add(SpillableSubpartition.java:96)
> - locked <0x2dfc> (a 
> org.apache.flink.runtime.io.network.partition.SpillableSubpartition)
> at 
> org.apache.flink.runtime.io.network.partition.ResultPartition.addBufferConsumer(ResultPartition.java:255)
> at 
> org.apache.flink.runtime.io.network.api.writer.Recor

[GitHub] isunjin commented on issue #6684: [FLINK-10205] Batch Job: InputSplit Fault tolerant for DataSource…

2018-10-21 Thread GitBox
isunjin commented on issue #6684: [FLINK-10205] Batch Job: InputSplit Fault 
tolerant for DataSource…
URL: https://github.com/apache/flink/pull/6684#issuecomment-431734239
 
 
   Great discussion, thanks everybody. 
   @wenlong88, the scenario you mention is what i try to fix.  
[here](https://github.com/isunjin/flink/commit/b61b58d963ea11d34e2eb7ec6f4fe4bfed4dca4a)
 is a concrete example, a simple word count job will have data inconsistent 
while failover, the job should fail but success with zero output.
   
   @tillrohrmann, **_InputSplitAssigner_** generate a list of _**InputSplit**_, 
the order might not matter, but every input should be proceed exactly once, if 
a task fail while process a _**InputSplit**_, this _**InputSplit**_ should be 
processed again, however, in batch scenario, it might not true,  
[this](https://github.com/isunjin/flink/commit/b61b58d963ea11d34e2eb7ec6f4fe4bfed4dca4a)
 repro shows that the current codebase doesn't has this logic and thus it has 
data inconsistent issue.
   
   Its not a problem in Streaming scenario, as the  _**InputSplit**_ will be 
treat as a record, eg: in _**ContinuousFileMonitoringFunction**_, it will 
collect  _**InputSplit**_ and every  _**InputSplit**_ will be guaranteed 
process exactly once by FLINK, @wenlong88 will this work in your scenario? 
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (FLINK-10205) Batch Job: InputSplit Fault tolerant for DataSourceTask

2018-10-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16658517#comment-16658517
 ] 

ASF GitHub Bot commented on FLINK-10205:


isunjin commented on issue #6684: [FLINK-10205] Batch Job: InputSplit Fault 
tolerant for DataSource…
URL: https://github.com/apache/flink/pull/6684#issuecomment-431734273
 
 
   Great discussion, thanks everybody. 
   
   @wenlong88, the scenario you mention is what i try to fix.  
[here](https://github.com/isunjin/flink/commit/b61b58d963ea11d34e2eb7ec6f4fe4bfed4dca4a)
 is a concrete example, a simple word count job will have data inconsistent 
while failover, the job should fail but success with zero output.
   
   @tillrohrmann, **_InputSplitAssigner_** generate a list of _**InputSplit**_, 
the order might not matter, but every input should be proceed exactly once, if 
a task fail while process a _**InputSplit**_, this _**InputSplit**_ should be 
processed again, however, in batch scenario, it might not true,  
[this](https://github.com/isunjin/flink/commit/b61b58d963ea11d34e2eb7ec6f4fe4bfed4dca4a)
 repro shows that the current codebase doesn't has this logic and thus it has 
data inconsistent issue.
   
   Its not a problem in Streaming scenario, as the  _**InputSplit**_ will be 
treat as a record, eg: in _**ContinuousFileMonitoringFunction**_, it will 
collect  _**InputSplit**_ and every  _**InputSplit**_ will be guaranteed 
process exactly once by FLINK, @wenlong88 will this work in your scenario? 
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Batch Job: InputSplit Fault tolerant for DataSourceTask
> ---
>
> Key: FLINK-10205
> URL: https://issues.apache.org/jira/browse/FLINK-10205
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager
>Affects Versions: 1.6.1, 1.6.2, 1.7.0
>Reporter: JIN SUN
>Assignee: JIN SUN
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.7.0
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Today DataSource Task pull InputSplits from JobManager to achieve better 
> performance, however, when a DataSourceTask failed and rerun, it will not get 
> the same splits as its previous version. this will introduce inconsistent 
> result or even data corruption.
> Furthermore,  if there are two executions run at the same time (in batch 
> scenario), this two executions should process same splits.
> we need to fix the issue to make the inputs of a DataSourceTask 
> deterministic. The propose is save all splits into ExecutionVertex and 
> DataSourceTask will pull split from there.
>  document:
> [https://docs.google.com/document/d/1FdZdcA63tPUEewcCimTFy9Iz2jlVlMRANZkO4RngIuk/edit?usp=sharing]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] isunjin commented on issue #6684: [FLINK-10205] Batch Job: InputSplit Fault tolerant for DataSource…

2018-10-21 Thread GitBox
isunjin commented on issue #6684: [FLINK-10205] Batch Job: InputSplit Fault 
tolerant for DataSource…
URL: https://github.com/apache/flink/pull/6684#issuecomment-431734273
 
 
   Great discussion, thanks everybody. 
   
   @wenlong88, the scenario you mention is what i try to fix.  
[here](https://github.com/isunjin/flink/commit/b61b58d963ea11d34e2eb7ec6f4fe4bfed4dca4a)
 is a concrete example, a simple word count job will have data inconsistent 
while failover, the job should fail but success with zero output.
   
   @tillrohrmann, **_InputSplitAssigner_** generate a list of _**InputSplit**_, 
the order might not matter, but every input should be proceed exactly once, if 
a task fail while process a _**InputSplit**_, this _**InputSplit**_ should be 
processed again, however, in batch scenario, it might not true,  
[this](https://github.com/isunjin/flink/commit/b61b58d963ea11d34e2eb7ec6f4fe4bfed4dca4a)
 repro shows that the current codebase doesn't has this logic and thus it has 
data inconsistent issue.
   
   Its not a problem in Streaming scenario, as the  _**InputSplit**_ will be 
treat as a record, eg: in _**ContinuousFileMonitoringFunction**_, it will 
collect  _**InputSplit**_ and every  _**InputSplit**_ will be guaranteed 
process exactly once by FLINK, @wenlong88 will this work in your scenario? 
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (FLINK-10205) Batch Job: InputSplit Fault tolerant for DataSourceTask

2018-10-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16658516#comment-16658516
 ] 

ASF GitHub Bot commented on FLINK-10205:


isunjin commented on issue #6684: [FLINK-10205] Batch Job: InputSplit Fault 
tolerant for DataSource…
URL: https://github.com/apache/flink/pull/6684#issuecomment-431734239
 
 
   Great discussion, thanks everybody. 
   @wenlong88, the scenario you mention is what i try to fix.  
[here](https://github.com/isunjin/flink/commit/b61b58d963ea11d34e2eb7ec6f4fe4bfed4dca4a)
 is a concrete example, a simple word count job will have data inconsistent 
while failover, the job should fail but success with zero output.
   
   @tillrohrmann, **_InputSplitAssigner_** generate a list of _**InputSplit**_, 
the order might not matter, but every input should be proceed exactly once, if 
a task fail while process a _**InputSplit**_, this _**InputSplit**_ should be 
processed again, however, in batch scenario, it might not true,  
[this](https://github.com/isunjin/flink/commit/b61b58d963ea11d34e2eb7ec6f4fe4bfed4dca4a)
 repro shows that the current codebase doesn't has this logic and thus it has 
data inconsistent issue.
   
   Its not a problem in Streaming scenario, as the  _**InputSplit**_ will be 
treat as a record, eg: in _**ContinuousFileMonitoringFunction**_, it will 
collect  _**InputSplit**_ and every  _**InputSplit**_ will be guaranteed 
process exactly once by FLINK, @wenlong88 will this work in your scenario? 
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Batch Job: InputSplit Fault tolerant for DataSourceTask
> ---
>
> Key: FLINK-10205
> URL: https://issues.apache.org/jira/browse/FLINK-10205
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager
>Affects Versions: 1.6.1, 1.6.2, 1.7.0
>Reporter: JIN SUN
>Assignee: JIN SUN
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.7.0
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Today DataSource Task pull InputSplits from JobManager to achieve better 
> performance, however, when a DataSourceTask failed and rerun, it will not get 
> the same splits as its previous version. this will introduce inconsistent 
> result or even data corruption.
> Furthermore,  if there are two executions run at the same time (in batch 
> scenario), this two executions should process same splits.
> we need to fix the issue to make the inputs of a DataSourceTask 
> deterministic. The propose is save all splits into ExecutionVertex and 
> DataSourceTask will pull split from there.
>  document:
> [https://docs.google.com/document/d/1FdZdcA63tPUEewcCimTFy9Iz2jlVlMRANZkO4RngIuk/edit?usp=sharing]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] isunjin commented on issue #6684: [FLINK-10205] Batch Job: InputSplit Fault tolerant for DataSource…

2018-10-21 Thread GitBox
isunjin commented on issue #6684: [FLINK-10205] Batch Job: InputSplit Fault 
tolerant for DataSource…
URL: https://github.com/apache/flink/pull/6684#issuecomment-431735667
 
 
   Great discussion, thanks everybody. 
   
   @wenlong88, the scenario you mention is what i try to fix.  
[here](https://github.com/isunjin/flink/commit/b61b58d963ea11d34e2eb7ec6f4fe4bfed4dca4a)
 is a concrete example, a simple word count job will have data inconsistent 
while failover, the job should fail but success with zero output.
   
   @tillrohrmann, **_InputSplitAssigner_** generate a list of _**InputSplit**_, 
the order might not matter, but every input should be proceed exactly once, if 
a task fail to process a _**InputSplit**_, this _**InputSplit**_ should be 
processed again, however, in batch scenario, it might not true, the 
_**DataSourceTask**_ will call _InputSplitAssigner_ to return _**InputSplit**_, 
depends on the implementation of _InputSplitAssigner_, the failed 
_**InputSplit**_ might be discard,  
[this](https://github.com/isunjin/flink/commit/b61b58d963ea11d34e2eb7ec6f4fe4bfed4dca4a)
 repro shows that _**LocatableInputSplitAssigner**_ will discard failed 
_**InputSplit**_  and thus it has data inconsistent issue.
   
   Its not a problem in Streaming scenario, as the  _**InputSplit**_ will be 
treat as a record, eg: in _**ContinuousFileMonitoringFunction**_, it will 
collect  _**InputSplit**_ and every  _**InputSplit**_ will be guaranteed 
process exactly once by FLINK, @wenlong88 will this work in your scenario? 
   
   @tillrohrmann, the problem here is that this is a bug, so should we hotfix 
it instead of waiting new feature available.  
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] isunjin commented on issue #6684: [FLINK-10205] Batch Job: InputSplit Fault tolerant for DataSource…

2018-10-21 Thread GitBox
isunjin commented on issue #6684: [FLINK-10205] Batch Job: InputSplit Fault 
tolerant for DataSource…
URL: https://github.com/apache/flink/pull/6684#issuecomment-431735693
 
 
   Great discussion, thanks everybody. 
   
   @wenlong88, the scenario you mention is what i try to fix.  
[here](https://github.com/isunjin/flink/commit/b61b58d963ea11d34e2eb7ec6f4fe4bfed4dca4a)
 is a concrete example, a simple word count job will have data inconsistent 
while failover, the job should fail but success with zero output.
   
   @tillrohrmann, **_InputSplitAssigner_** generate a list of _**InputSplit**_, 
the order might not matter, but every input should be proceed exactly once, if 
a task fail to process a _**InputSplit**_, this _**InputSplit**_ should be 
processed again, however, in batch scenario, it might not true, the 
_**DataSourceTask**_ will call _InputSplitAssigner_ to return _**InputSplit**_, 
depends on the implementation of _InputSplitAssigner_, the failed 
_**InputSplit**_ might be discard,  
[this](https://github.com/isunjin/flink/commit/b61b58d963ea11d34e2eb7ec6f4fe4bfed4dca4a)
 repro shows that _**LocatableInputSplitAssigner**_ will discard failed 
_**InputSplit**_  and thus it has data inconsistent issue.
   
   Its not a problem in Streaming scenario, as the  _**InputSplit**_ will be 
treat as a record, eg: in _**ContinuousFileMonitoringFunction**_, it will 
collect  _**InputSplit**_ and every  _**InputSplit**_ will be guaranteed 
process exactly once by FLINK, @wenlong88 will this work in your scenario? 
   
   @tillrohrmann, the problem here is that this is a bug, so should we hotfix 
it instead of waiting new feature available.  
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (FLINK-10205) Batch Job: InputSplit Fault tolerant for DataSourceTask

2018-10-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16658536#comment-16658536
 ] 

ASF GitHub Bot commented on FLINK-10205:


isunjin commented on issue #6684: [FLINK-10205] Batch Job: InputSplit Fault 
tolerant for DataSource…
URL: https://github.com/apache/flink/pull/6684#issuecomment-431735693
 
 
   Great discussion, thanks everybody. 
   
   @wenlong88, the scenario you mention is what i try to fix.  
[here](https://github.com/isunjin/flink/commit/b61b58d963ea11d34e2eb7ec6f4fe4bfed4dca4a)
 is a concrete example, a simple word count job will have data inconsistent 
while failover, the job should fail but success with zero output.
   
   @tillrohrmann, **_InputSplitAssigner_** generate a list of _**InputSplit**_, 
the order might not matter, but every input should be proceed exactly once, if 
a task fail to process a _**InputSplit**_, this _**InputSplit**_ should be 
processed again, however, in batch scenario, it might not true, the 
_**DataSourceTask**_ will call _InputSplitAssigner_ to return _**InputSplit**_, 
depends on the implementation of _InputSplitAssigner_, the failed 
_**InputSplit**_ might be discard,  
[this](https://github.com/isunjin/flink/commit/b61b58d963ea11d34e2eb7ec6f4fe4bfed4dca4a)
 repro shows that _**LocatableInputSplitAssigner**_ will discard failed 
_**InputSplit**_  and thus it has data inconsistent issue.
   
   Its not a problem in Streaming scenario, as the  _**InputSplit**_ will be 
treat as a record, eg: in _**ContinuousFileMonitoringFunction**_, it will 
collect  _**InputSplit**_ and every  _**InputSplit**_ will be guaranteed 
process exactly once by FLINK, @wenlong88 will this work in your scenario? 
   
   @tillrohrmann, the problem here is that this is a bug, so should we hotfix 
it instead of waiting new feature available.  
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Batch Job: InputSplit Fault tolerant for DataSourceTask
> ---
>
> Key: FLINK-10205
> URL: https://issues.apache.org/jira/browse/FLINK-10205
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager
>Affects Versions: 1.6.1, 1.6.2, 1.7.0
>Reporter: JIN SUN
>Assignee: JIN SUN
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.7.0
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Today DataSource Task pull InputSplits from JobManager to achieve better 
> performance, however, when a DataSourceTask failed and rerun, it will not get 
> the same splits as its previous version. this will introduce inconsistent 
> result or even data corruption.
> Furthermore,  if there are two executions run at the same time (in batch 
> scenario), this two executions should process same splits.
> we need to fix the issue to make the inputs of a DataSourceTask 
> deterministic. The propose is save all splits into ExecutionVertex and 
> DataSourceTask will pull split from there.
>  document:
> [https://docs.google.com/document/d/1FdZdcA63tPUEewcCimTFy9Iz2jlVlMRANZkO4RngIuk/edit?usp=sharing]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10205) Batch Job: InputSplit Fault tolerant for DataSourceTask

2018-10-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16658535#comment-16658535
 ] 

ASF GitHub Bot commented on FLINK-10205:


isunjin commented on issue #6684: [FLINK-10205] Batch Job: InputSplit Fault 
tolerant for DataSource…
URL: https://github.com/apache/flink/pull/6684#issuecomment-431735667
 
 
   Great discussion, thanks everybody. 
   
   @wenlong88, the scenario you mention is what i try to fix.  
[here](https://github.com/isunjin/flink/commit/b61b58d963ea11d34e2eb7ec6f4fe4bfed4dca4a)
 is a concrete example, a simple word count job will have data inconsistent 
while failover, the job should fail but success with zero output.
   
   @tillrohrmann, **_InputSplitAssigner_** generate a list of _**InputSplit**_, 
the order might not matter, but every input should be proceed exactly once, if 
a task fail to process a _**InputSplit**_, this _**InputSplit**_ should be 
processed again, however, in batch scenario, it might not true, the 
_**DataSourceTask**_ will call _InputSplitAssigner_ to return _**InputSplit**_, 
depends on the implementation of _InputSplitAssigner_, the failed 
_**InputSplit**_ might be discard,  
[this](https://github.com/isunjin/flink/commit/b61b58d963ea11d34e2eb7ec6f4fe4bfed4dca4a)
 repro shows that _**LocatableInputSplitAssigner**_ will discard failed 
_**InputSplit**_  and thus it has data inconsistent issue.
   
   Its not a problem in Streaming scenario, as the  _**InputSplit**_ will be 
treat as a record, eg: in _**ContinuousFileMonitoringFunction**_, it will 
collect  _**InputSplit**_ and every  _**InputSplit**_ will be guaranteed 
process exactly once by FLINK, @wenlong88 will this work in your scenario? 
   
   @tillrohrmann, the problem here is that this is a bug, so should we hotfix 
it instead of waiting new feature available.  
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Batch Job: InputSplit Fault tolerant for DataSourceTask
> ---
>
> Key: FLINK-10205
> URL: https://issues.apache.org/jira/browse/FLINK-10205
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager
>Affects Versions: 1.6.1, 1.6.2, 1.7.0
>Reporter: JIN SUN
>Assignee: JIN SUN
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.7.0
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Today DataSource Task pull InputSplits from JobManager to achieve better 
> performance, however, when a DataSourceTask failed and rerun, it will not get 
> the same splits as its previous version. this will introduce inconsistent 
> result or even data corruption.
> Furthermore,  if there are two executions run at the same time (in batch 
> scenario), this two executions should process same splits.
> we need to fix the issue to make the inputs of a DataSourceTask 
> deterministic. The propose is save all splits into ExecutionVertex and 
> DataSourceTask will pull split from there.
>  document:
> [https://docs.google.com/document/d/1FdZdcA63tPUEewcCimTFy9Iz2jlVlMRANZkO4RngIuk/edit?usp=sharing]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (FLINK-9761) Potential buffer leak in PartitionRequestClientHandler during job failures

2018-10-21 Thread zhijiang (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16658543#comment-16658543
 ] 

zhijiang edited comment on FLINK-9761 at 10/22/18 3:56 AM:
---

I just quickly reviewed the related codes and think this is still a problem 
which exists only in non-credit-based mode.

When {{PartitionRequestClientHandler.BufferListenerTask#notifyBufferDestroyed}} 
is called by canceler thread, and the {{stagedBufferResponse}} exists 
currently. But we directly set {{stagedBufferResponse = null}}, so it has no 
chance to consume and release this netty message any more resulting in leak 
issue.

 

Even though the {{stageMessages}} is not empty, the {{stagedMessageHandler}} 
would only consume and release the messages in this {{stageMessages}} list, and 
it will not consume and release {{stagedBufferResponse}} firstly. So it still 
has logic problem I think.

 

Maybe need [~NicoK] double check if I guessed the above issue correctly.


was (Author: zjwang):
I just quickly reviewed the related codes and think this is still a problem 
which exists only in non-credit-based mode.

When {{PartitionRequestClientHandler.BufferListenerTask#notifyBufferDestroyed}} 
is called by canceler thread, and the {{stagedBufferResponse}} is not 
currently. But we directly set {{stagedBufferResponse = null}}, so it has no 
chance to consume and release this netty message any more resulting in leak 
issue.

 

Even though the {{stageMessages}} is not empty, the {{stagedMessageHandler}} 
would only consume and release the messages in this {{stageMessages}} list, and 
it will not consume and release {{stagedBufferResponse}} firstly. So it still 
has logic problem I think.

 

Maybe need [~NicoK] double check if I guessed the above issue correctly.

> Potential buffer leak in PartitionRequestClientHandler during job failures
> --
>
> Key: FLINK-9761
> URL: https://issues.apache.org/jira/browse/FLINK-9761
> Project: Flink
>  Issue Type: Bug
>  Components: Network
>Affects Versions: 1.5.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Critical
> Fix For: 1.5.6, 1.6.3, 1.7.0
>
>
> {{PartitionRequestClientHandler#stagedMessages}} may be accessed from 
> multiple threads:
> 1) Netty's IO thread
> 2) During cancellation, 
> {{PartitionRequestClientHandler.BufferListenerTask#notifyBufferDestroyed}} is 
> called
> If {{PartitionRequestClientHandler.BufferListenerTask#notifyBufferDestroyed}} 
> thinks, {{stagesMessages}} is empty, however, it will not install the 
> {{stagedMessagesHandler}} that consumes and releases buffers from received 
> messages.
> Unless some unexpected combination of code calls prevents this from 
> happening, this would leak the non-recycled buffers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9761) Potential buffer leak in PartitionRequestClientHandler during job failures

2018-10-21 Thread zhijiang (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16658543#comment-16658543
 ] 

zhijiang commented on FLINK-9761:
-

I just quickly reviewed the related codes and think this is still a problem 
which exists only in non-credit-based mode.

When {{PartitionRequestClientHandler.BufferListenerTask#notifyBufferDestroyed}} 
is called by canceler thread, and the {{stagedBufferResponse}} is not 
currently. But we directly set {{stagedBufferResponse = null}}, so it has no 
chance to consume and release this netty message any more resulting in leak 
issue.

 

Even though the {{stageMessages}} is not empty, the {{stagedMessageHandler}} 
would only consume and release the messages in this {{stageMessages}} list, and 
it will not consume and release {{stagedBufferResponse}} firstly. So it still 
has logic problem I think.

 

Maybe need [~NicoK] double check if I guessed the above issue correctly.

> Potential buffer leak in PartitionRequestClientHandler during job failures
> --
>
> Key: FLINK-9761
> URL: https://issues.apache.org/jira/browse/FLINK-9761
> Project: Flink
>  Issue Type: Bug
>  Components: Network
>Affects Versions: 1.5.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Critical
> Fix For: 1.5.6, 1.6.3, 1.7.0
>
>
> {{PartitionRequestClientHandler#stagedMessages}} may be accessed from 
> multiple threads:
> 1) Netty's IO thread
> 2) During cancellation, 
> {{PartitionRequestClientHandler.BufferListenerTask#notifyBufferDestroyed}} is 
> called
> If {{PartitionRequestClientHandler.BufferListenerTask#notifyBufferDestroyed}} 
> thinks, {{stagesMessages}} is empty, however, it will not install the 
> {{stagedMessagesHandler}} that consumes and releases buffers from received 
> messages.
> Unless some unexpected combination of code calls prevents this from 
> happening, this would leak the non-recycled buffers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet input format

2018-10-21 Thread GitBox
HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet 
input format
URL: https://github.com/apache/flink/pull/6483#issuecomment-431737413
 
 
   @fhueske 
   Thanks for your patient review. It is pretty helpful to make the PR more 
readable and flawless. Resolved your comments. Please read one more round at 
your most convenient time.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet input format

2018-10-21 Thread GitBox
HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet 
input format
URL: https://github.com/apache/flink/pull/6483#issuecomment-431737461
 
 
   @fhueske 
   Thanks for your patient review. It is pretty helpful to make the PR more 
readable and flawless. Resolved your comments. Please read one more round at 
your most convenient time.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet input format

2018-10-21 Thread GitBox
HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet 
input format
URL: https://github.com/apache/flink/pull/6483#issuecomment-431737435
 
 
   @fhueske 
   Thanks for your patient review. It is pretty helpful to make the PR more 
readable and flawless. Resolved your comments. Please read one more round at 
your most convenient time.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (FLINK-7243) Add ParquetInputFormat

2018-10-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-7243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16658549#comment-16658549
 ] 

ASF GitHub Bot commented on FLINK-7243:
---

HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet 
input format
URL: https://github.com/apache/flink/pull/6483#issuecomment-431737413
 
 
   @fhueske 
   Thanks for your patient review. It is pretty helpful to make the PR more 
readable and flawless. Resolved your comments. Please read one more round at 
your most convenient time.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add ParquetInputFormat
> --
>
> Key: FLINK-7243
> URL: https://issues.apache.org/jira/browse/FLINK-7243
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API & SQL
>Reporter: godfrey he
>Assignee: Zhenqiu Huang
>Priority: Major
>  Labels: pull-request-available
>
> Add a {{ParquetInputFormat}} to read data from a Apache Parquet file. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-7243) Add ParquetInputFormat

2018-10-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-7243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16658550#comment-16658550
 ] 

ASF GitHub Bot commented on FLINK-7243:
---

HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet 
input format
URL: https://github.com/apache/flink/pull/6483#issuecomment-431737435
 
 
   @fhueske 
   Thanks for your patient review. It is pretty helpful to make the PR more 
readable and flawless. Resolved your comments. Please read one more round at 
your most convenient time.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add ParquetInputFormat
> --
>
> Key: FLINK-7243
> URL: https://issues.apache.org/jira/browse/FLINK-7243
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API & SQL
>Reporter: godfrey he
>Assignee: Zhenqiu Huang
>Priority: Major
>  Labels: pull-request-available
>
> Add a {{ParquetInputFormat}} to read data from a Apache Parquet file. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet input format

2018-10-21 Thread GitBox
HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet 
input format
URL: https://github.com/apache/flink/pull/6483#issuecomment-431737481
 
 
   @fhueske 
   Thanks for your patient review. It is pretty helpful to make the PR more 
readable and flawless. Resolved your comments. Please read one more round at 
your most convenient time.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (FLINK-7243) Add ParquetInputFormat

2018-10-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-7243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16658551#comment-16658551
 ] 

ASF GitHub Bot commented on FLINK-7243:
---

HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet 
input format
URL: https://github.com/apache/flink/pull/6483#issuecomment-431737461
 
 
   @fhueske 
   Thanks for your patient review. It is pretty helpful to make the PR more 
readable and flawless. Resolved your comments. Please read one more round at 
your most convenient time.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add ParquetInputFormat
> --
>
> Key: FLINK-7243
> URL: https://issues.apache.org/jira/browse/FLINK-7243
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API & SQL
>Reporter: godfrey he
>Assignee: Zhenqiu Huang
>Priority: Major
>  Labels: pull-request-available
>
> Add a {{ParquetInputFormat}} to read data from a Apache Parquet file. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet input format

2018-10-21 Thread GitBox
HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet 
input format
URL: https://github.com/apache/flink/pull/6483#issuecomment-431737523
 
 
   @fhueske 
   Thanks for your patient review. It is pretty helpful to make the PR more 
readable and flawless. Resolved your comments. Please read one more round at 
your most convenient time.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (FLINK-7243) Add ParquetInputFormat

2018-10-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-7243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16658552#comment-16658552
 ] 

ASF GitHub Bot commented on FLINK-7243:
---

HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet 
input format
URL: https://github.com/apache/flink/pull/6483#issuecomment-431737481
 
 
   @fhueske 
   Thanks for your patient review. It is pretty helpful to make the PR more 
readable and flawless. Resolved your comments. Please read one more round at 
your most convenient time.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add ParquetInputFormat
> --
>
> Key: FLINK-7243
> URL: https://issues.apache.org/jira/browse/FLINK-7243
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API & SQL
>Reporter: godfrey he
>Assignee: Zhenqiu Huang
>Priority: Major
>  Labels: pull-request-available
>
> Add a {{ParquetInputFormat}} to read data from a Apache Parquet file. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet input format

2018-10-21 Thread GitBox
HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet 
input format
URL: https://github.com/apache/flink/pull/6483#issuecomment-431737558
 
 
   @fhueske 
   Thanks for your patient review. It is pretty helpful to make the PR more 
readable and flawless. Resolved your comments. Please read one more round at 
your most convenient time.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet input format

2018-10-21 Thread GitBox
HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet 
input format
URL: https://github.com/apache/flink/pull/6483#issuecomment-431737590
 
 
   @fhueske 
   Thanks for your patient review. It is pretty helpful to make the PR more 
readable and flawless. Resolved your comments. Please read one more round at 
your most convenient time.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (FLINK-7243) Add ParquetInputFormat

2018-10-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-7243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16658553#comment-16658553
 ] 

ASF GitHub Bot commented on FLINK-7243:
---

HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet 
input format
URL: https://github.com/apache/flink/pull/6483#issuecomment-431737523
 
 
   @fhueske 
   Thanks for your patient review. It is pretty helpful to make the PR more 
readable and flawless. Resolved your comments. Please read one more round at 
your most convenient time.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add ParquetInputFormat
> --
>
> Key: FLINK-7243
> URL: https://issues.apache.org/jira/browse/FLINK-7243
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API & SQL
>Reporter: godfrey he
>Assignee: Zhenqiu Huang
>Priority: Major
>  Labels: pull-request-available
>
> Add a {{ParquetInputFormat}} to read data from a Apache Parquet file. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] HuangZhenQiu closed pull request #6483: [FLINK-7243][flink-formats] Add parquet input format

2018-10-21 Thread GitBox
HuangZhenQiu closed pull request #6483: [FLINK-7243][flink-formats] Add parquet 
input format
URL: https://github.com/apache/flink/pull/6483
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/.gitignore b/.gitignore
index 20749c24242..fdf7bedfb26 100644
--- a/.gitignore
+++ b/.gitignore
@@ -19,6 +19,7 @@ tmp
 build-target
 
flink-end-to-end-tests/flink-datastream-allround-test/src/main/java/org/apache/flink/streaming/tests/avro/
 flink-formats/flink-avro/src/test/java/org/apache/flink/formats/avro/generated/
+flink-formats/flink-parquet/src/test/java/org/apache/flink/formats/parquet/generated/
 flink-runtime-web/web-dashboard/assets/fonts/
 flink-runtime-web/web-dashboard/node_modules/
 flink-runtime-web/web-dashboard/bower_components/
diff --git 
a/flink-core/src/main/java/org/apache/flink/api/common/io/FileInputFormat.java 
b/flink-core/src/main/java/org/apache/flink/api/common/io/FileInputFormat.java
index 14cf647cd24..4177af72318 100644
--- 
a/flink-core/src/main/java/org/apache/flink/api/common/io/FileInputFormat.java
+++ 
b/flink-core/src/main/java/org/apache/flink/api/common/io/FileInputFormat.java
@@ -233,6 +233,16 @@ protected static String extractFileExtension(String 
fileName) {
 */
protected boolean enumerateNestedFiles = false;
 
+   /**
+* The flag to specify whether to skip file splits with wrong schema.
+*/
+   protected boolean skipWrongSchemaFileSplit = false;
+
+   /**
+* The flag to specify whether to skip corrupted record.
+*/
+   protected boolean skipCorruptedRecord = true;
+
/**
 * Files filter for determining what files/directories should be 
included.
 */
@@ -463,6 +473,14 @@ public void configure(Configuration parameters) {
if (!this.enumerateNestedFiles) {
this.enumerateNestedFiles = 
parameters.getBoolean(ENUMERATE_NESTED_FILES_FLAG, false);
}
+
+   if (!this.skipWrongSchemaFileSplit) {
+   this.skipWrongSchemaFileSplit = 
parameters.getBoolean(SKIP_WRONG_SCHEMA_SPLITS, false);
+   }
+
+   if (this.skipCorruptedRecord) {
+   this.skipCorruptedRecord = 
parameters.getBoolean(SKIP_CORRUPTED_RECORD, true);
+   }
}
 
/**
@@ -1077,4 +1095,14 @@ private void abortWait() {
 * The config parameter which defines whether input directories are 
recursively traversed.
 */
public static final String ENUMERATE_NESTED_FILES_FLAG = 
"recursive.file.enumeration";
+
+   /**
+* The config parameter which defines whether to skip file split with 
wrong schema.
+*/
+   public static final String SKIP_WRONG_SCHEMA_SPLITS = 
"skip.splits.wrong.schema";
+
+   /**
+* The config parameter which defines whether to skip corrupted record.
+*/
+   public static final String SKIP_CORRUPTED_RECORD = 
"skip.corrupted.record";
 }
diff --git 
a/flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/ParquetInputFormat.java
 
b/flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/ParquetInputFormat.java
new file mode 100644
index 000..dab1899a1ce
--- /dev/null
+++ 
b/flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/ParquetInputFormat.java
@@ -0,0 +1,248 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet;
+
+import org.apache.flink.api.common.io.CheckpointableInputFormat;
+import org.apache.flink.api.common.io.FileInputFormat;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.api.java.tuple.Tuple2;
+import org.apache.flink.api.java.typeutils.RowTypeInfo;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.core.fs.FileInputSplit;
+import org.apache.flink.core.fs.Path;
+import org.apache.flink.formats.parquet.utils.ParquetRecordR

[jira] [Commented] (FLINK-7243) Add ParquetInputFormat

2018-10-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-7243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16658556#comment-16658556
 ] 

ASF GitHub Bot commented on FLINK-7243:
---

HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet 
input format
URL: https://github.com/apache/flink/pull/6483#issuecomment-431737615
 
 
   @fhueske 
   Thanks for your patient review. It is pretty helpful to make the PR more 
readable and flawless. Resolved your comments. Please read one more round at 
your most convenient time.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add ParquetInputFormat
> --
>
> Key: FLINK-7243
> URL: https://issues.apache.org/jira/browse/FLINK-7243
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API & SQL
>Reporter: godfrey he
>Assignee: Zhenqiu Huang
>Priority: Major
>  Labels: pull-request-available
>
> Add a {{ParquetInputFormat}} to read data from a Apache Parquet file. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet input format

2018-10-21 Thread GitBox
HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet 
input format
URL: https://github.com/apache/flink/pull/6483#issuecomment-431737642
 
 
   @fhueske 
   Thanks for your patient review. It is pretty helpful to make the PR more 
readable and flawless. Resolved your comments. Please read one more round at 
your most convenient time.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet input format

2018-10-21 Thread GitBox
HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet 
input format
URL: https://github.com/apache/flink/pull/6483#issuecomment-431737615
 
 
   @fhueske 
   Thanks for your patient review. It is pretty helpful to make the PR more 
readable and flawless. Resolved your comments. Please read one more round at 
your most convenient time.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (FLINK-7243) Add ParquetInputFormat

2018-10-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-7243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16658558#comment-16658558
 ] 

ASF GitHub Bot commented on FLINK-7243:
---

HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet 
input format
URL: https://github.com/apache/flink/pull/6483#issuecomment-431737642
 
 
   @fhueske 
   Thanks for your patient review. It is pretty helpful to make the PR more 
readable and flawless. Resolved your comments. Please read one more round at 
your most convenient time.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add ParquetInputFormat
> --
>
> Key: FLINK-7243
> URL: https://issues.apache.org/jira/browse/FLINK-7243
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API & SQL
>Reporter: godfrey he
>Assignee: Zhenqiu Huang
>Priority: Major
>  Labels: pull-request-available
>
> Add a {{ParquetInputFormat}} to read data from a Apache Parquet file. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-7243) Add ParquetInputFormat

2018-10-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-7243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16658557#comment-16658557
 ] 

ASF GitHub Bot commented on FLINK-7243:
---

HuangZhenQiu closed pull request #6483: [FLINK-7243][flink-formats] Add parquet 
input format
URL: https://github.com/apache/flink/pull/6483
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/.gitignore b/.gitignore
index 20749c24242..fdf7bedfb26 100644
--- a/.gitignore
+++ b/.gitignore
@@ -19,6 +19,7 @@ tmp
 build-target
 
flink-end-to-end-tests/flink-datastream-allround-test/src/main/java/org/apache/flink/streaming/tests/avro/
 flink-formats/flink-avro/src/test/java/org/apache/flink/formats/avro/generated/
+flink-formats/flink-parquet/src/test/java/org/apache/flink/formats/parquet/generated/
 flink-runtime-web/web-dashboard/assets/fonts/
 flink-runtime-web/web-dashboard/node_modules/
 flink-runtime-web/web-dashboard/bower_components/
diff --git 
a/flink-core/src/main/java/org/apache/flink/api/common/io/FileInputFormat.java 
b/flink-core/src/main/java/org/apache/flink/api/common/io/FileInputFormat.java
index 14cf647cd24..4177af72318 100644
--- 
a/flink-core/src/main/java/org/apache/flink/api/common/io/FileInputFormat.java
+++ 
b/flink-core/src/main/java/org/apache/flink/api/common/io/FileInputFormat.java
@@ -233,6 +233,16 @@ protected static String extractFileExtension(String 
fileName) {
 */
protected boolean enumerateNestedFiles = false;
 
+   /**
+* The flag to specify whether to skip file splits with wrong schema.
+*/
+   protected boolean skipWrongSchemaFileSplit = false;
+
+   /**
+* The flag to specify whether to skip corrupted record.
+*/
+   protected boolean skipCorruptedRecord = true;
+
/**
 * Files filter for determining what files/directories should be 
included.
 */
@@ -463,6 +473,14 @@ public void configure(Configuration parameters) {
if (!this.enumerateNestedFiles) {
this.enumerateNestedFiles = 
parameters.getBoolean(ENUMERATE_NESTED_FILES_FLAG, false);
}
+
+   if (!this.skipWrongSchemaFileSplit) {
+   this.skipWrongSchemaFileSplit = 
parameters.getBoolean(SKIP_WRONG_SCHEMA_SPLITS, false);
+   }
+
+   if (this.skipCorruptedRecord) {
+   this.skipCorruptedRecord = 
parameters.getBoolean(SKIP_CORRUPTED_RECORD, true);
+   }
}
 
/**
@@ -1077,4 +1095,14 @@ private void abortWait() {
 * The config parameter which defines whether input directories are 
recursively traversed.
 */
public static final String ENUMERATE_NESTED_FILES_FLAG = 
"recursive.file.enumeration";
+
+   /**
+* The config parameter which defines whether to skip file split with 
wrong schema.
+*/
+   public static final String SKIP_WRONG_SCHEMA_SPLITS = 
"skip.splits.wrong.schema";
+
+   /**
+* The config parameter which defines whether to skip corrupted record.
+*/
+   public static final String SKIP_CORRUPTED_RECORD = 
"skip.corrupted.record";
 }
diff --git 
a/flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/ParquetInputFormat.java
 
b/flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/ParquetInputFormat.java
new file mode 100644
index 000..dab1899a1ce
--- /dev/null
+++ 
b/flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/ParquetInputFormat.java
@@ -0,0 +1,248 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet;
+
+import org.apache.flink.api.common.io.CheckpointableInputFormat;
+import org.apache.flink.api.common.io.FileInputFormat;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.api.java.tuple.Tuple2;
+im

[jira] [Commented] (FLINK-7243) Add ParquetInputFormat

2018-10-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-7243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16658554#comment-16658554
 ] 

ASF GitHub Bot commented on FLINK-7243:
---

HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet 
input format
URL: https://github.com/apache/flink/pull/6483#issuecomment-431737558
 
 
   @fhueske 
   Thanks for your patient review. It is pretty helpful to make the PR more 
readable and flawless. Resolved your comments. Please read one more round at 
your most convenient time.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add ParquetInputFormat
> --
>
> Key: FLINK-7243
> URL: https://issues.apache.org/jira/browse/FLINK-7243
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API & SQL
>Reporter: godfrey he
>Assignee: Zhenqiu Huang
>Priority: Major
>  Labels: pull-request-available
>
> Add a {{ParquetInputFormat}} to read data from a Apache Parquet file. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-7243) Add ParquetInputFormat

2018-10-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-7243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16658555#comment-16658555
 ] 

ASF GitHub Bot commented on FLINK-7243:
---

HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet 
input format
URL: https://github.com/apache/flink/pull/6483#issuecomment-431737590
 
 
   @fhueske 
   Thanks for your patient review. It is pretty helpful to make the PR more 
readable and flawless. Resolved your comments. Please read one more round at 
your most convenient time.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add ParquetInputFormat
> --
>
> Key: FLINK-7243
> URL: https://issues.apache.org/jira/browse/FLINK-7243
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API & SQL
>Reporter: godfrey he
>Assignee: Zhenqiu Huang
>Priority: Major
>  Labels: pull-request-available
>
> Add a {{ParquetInputFormat}} to read data from a Apache Parquet file. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] HuangZhenQiu removed a comment on issue #6483: [FLINK-7243][flink-formats] Add parquet input format

2018-10-21 Thread GitBox
HuangZhenQiu removed a comment on issue #6483: [FLINK-7243][flink-formats] Add 
parquet input format
URL: https://github.com/apache/flink/pull/6483#issuecomment-431737590
 
 
   @fhueske 
   Thanks for your patient review. It is pretty helpful to make the PR more 
readable and flawless. Resolved your comments. Please read one more round at 
your most convenient time.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (FLINK-7243) Add ParquetInputFormat

2018-10-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-7243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16658560#comment-16658560
 ] 

ASF GitHub Bot commented on FLINK-7243:
---

HuangZhenQiu removed a comment on issue #6483: [FLINK-7243][flink-formats] Add 
parquet input format
URL: https://github.com/apache/flink/pull/6483#issuecomment-431737481
 
 
   @fhueske 
   Thanks for your patient review. It is pretty helpful to make the PR more 
readable and flawless. Resolved your comments. Please read one more round at 
your most convenient time.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add ParquetInputFormat
> --
>
> Key: FLINK-7243
> URL: https://issues.apache.org/jira/browse/FLINK-7243
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API & SQL
>Reporter: godfrey he
>Assignee: Zhenqiu Huang
>Priority: Major
>  Labels: pull-request-available
>
> Add a {{ParquetInputFormat}} to read data from a Apache Parquet file. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] HuangZhenQiu removed a comment on issue #6483: [FLINK-7243][flink-formats] Add parquet input format

2018-10-21 Thread GitBox
HuangZhenQiu removed a comment on issue #6483: [FLINK-7243][flink-formats] Add 
parquet input format
URL: https://github.com/apache/flink/pull/6483#issuecomment-431737481
 
 
   @fhueske 
   Thanks for your patient review. It is pretty helpful to make the PR more 
readable and flawless. Resolved your comments. Please read one more round at 
your most convenient time.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] HuangZhenQiu removed a comment on issue #6483: [FLINK-7243][flink-formats] Add parquet input format

2018-10-21 Thread GitBox
HuangZhenQiu removed a comment on issue #6483: [FLINK-7243][flink-formats] Add 
parquet input format
URL: https://github.com/apache/flink/pull/6483#issuecomment-431737435
 
 
   @fhueske 
   Thanks for your patient review. It is pretty helpful to make the PR more 
readable and flawless. Resolved your comments. Please read one more round at 
your most convenient time.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (FLINK-7243) Add ParquetInputFormat

2018-10-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-7243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16658561#comment-16658561
 ] 

ASF GitHub Bot commented on FLINK-7243:
---

HuangZhenQiu removed a comment on issue #6483: [FLINK-7243][flink-formats] Add 
parquet input format
URL: https://github.com/apache/flink/pull/6483#issuecomment-431737590
 
 
   @fhueske 
   Thanks for your patient review. It is pretty helpful to make the PR more 
readable and flawless. Resolved your comments. Please read one more round at 
your most convenient time.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add ParquetInputFormat
> --
>
> Key: FLINK-7243
> URL: https://issues.apache.org/jira/browse/FLINK-7243
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API & SQL
>Reporter: godfrey he
>Assignee: Zhenqiu Huang
>Priority: Major
>  Labels: pull-request-available
>
> Add a {{ParquetInputFormat}} to read data from a Apache Parquet file. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-7243) Add ParquetInputFormat

2018-10-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-7243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16658559#comment-16658559
 ] 

ASF GitHub Bot commented on FLINK-7243:
---

HuangZhenQiu removed a comment on issue #6483: [FLINK-7243][flink-formats] Add 
parquet input format
URL: https://github.com/apache/flink/pull/6483#issuecomment-431737435
 
 
   @fhueske 
   Thanks for your patient review. It is pretty helpful to make the PR more 
readable and flawless. Resolved your comments. Please read one more round at 
your most convenient time.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add ParquetInputFormat
> --
>
> Key: FLINK-7243
> URL: https://issues.apache.org/jira/browse/FLINK-7243
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API & SQL
>Reporter: godfrey he
>Assignee: Zhenqiu Huang
>Priority: Major
>  Labels: pull-request-available
>
> Add a {{ParquetInputFormat}} to read data from a Apache Parquet file. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] HuangZhenQiu removed a comment on issue #6483: [FLINK-7243][flink-formats] Add parquet input format

2018-10-21 Thread GitBox
HuangZhenQiu removed a comment on issue #6483: [FLINK-7243][flink-formats] Add 
parquet input format
URL: https://github.com/apache/flink/pull/6483#issuecomment-431737413
 
 
   @fhueske 
   Thanks for your patient review. It is pretty helpful to make the PR more 
readable and flawless. Resolved your comments. Please read one more round at 
your most convenient time.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (FLINK-7243) Add ParquetInputFormat

2018-10-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-7243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16658562#comment-16658562
 ] 

ASF GitHub Bot commented on FLINK-7243:
---

HuangZhenQiu removed a comment on issue #6483: [FLINK-7243][flink-formats] Add 
parquet input format
URL: https://github.com/apache/flink/pull/6483#issuecomment-431737413
 
 
   @fhueske 
   Thanks for your patient review. It is pretty helpful to make the PR more 
readable and flawless. Resolved your comments. Please read one more round at 
your most convenient time.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add ParquetInputFormat
> --
>
> Key: FLINK-7243
> URL: https://issues.apache.org/jira/browse/FLINK-7243
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API & SQL
>Reporter: godfrey he
>Assignee: Zhenqiu Huang
>Priority: Major
>  Labels: pull-request-available
>
> Add a {{ParquetInputFormat}} to read data from a Apache Parquet file. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet input format

2018-10-21 Thread GitBox
HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet 
input format
URL: https://github.com/apache/flink/pull/6483#issuecomment-431737854
 
 
   @fhueske 
   Thanks for your patient review. It is pretty helpful to make the PR more 
readable and flawless. Resolved your comments. Please read one more round at 
your most convenient time.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (FLINK-7243) Add ParquetInputFormat

2018-10-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-7243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16658563#comment-16658563
 ] 

ASF GitHub Bot commented on FLINK-7243:
---

HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet 
input format
URL: https://github.com/apache/flink/pull/6483#issuecomment-431737854
 
 
   @fhueske 
   Thanks for your patient review. It is pretty helpful to make the PR more 
readable and flawless. Resolved your comments. Please read one more round at 
your most convenient time.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add ParquetInputFormat
> --
>
> Key: FLINK-7243
> URL: https://issues.apache.org/jira/browse/FLINK-7243
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API & SQL
>Reporter: godfrey he
>Assignee: Zhenqiu Huang
>Priority: Major
>  Labels: pull-request-available
>
> Add a {{ParquetInputFormat}} to read data from a Apache Parquet file. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] HuangZhenQiu opened a new pull request #6483: [FLINK-7243][flink-formats] Add parquet input format

2018-10-21 Thread GitBox
HuangZhenQiu opened a new pull request #6483: [FLINK-7243][flink-formats] Add 
parquet input format
URL: https://github.com/apache/flink/pull/6483
 
 
   ## What is the purpose of the change
   
   This pull request is to create a ParquetInputFormat, so that Flink can 
process files in Parquet schema. Parquet record can be read as Row, Pojo and 
Generic Map.
   
   ## Brief change log
   
 - *Add schema converter for Parquet types to Flink internal types*
 - *Add ParquetRecordRead to read Parquet record as Row*
 - *Add ParquetInputFormat that can be exteneded to convert Row to Pojo or 
Generic Map*
   
   
   ## Verifying this change
   
   This change is already covered by existing tests, such as 
ParquetRecordReaderTest, ParquetSchemaConverterTest and  ParquetInputFormatTest.
   
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (no)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
 - The serializers: (no)
 - The runtime per-record code paths (performance sensitive): (no)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)
 - The S3 file system connector: (no)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes)
 - If yes, how is the feature documented? (JavaDocs)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (FLINK-7243) Add ParquetInputFormat

2018-10-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-7243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16658564#comment-16658564
 ] 

ASF GitHub Bot commented on FLINK-7243:
---

HuangZhenQiu opened a new pull request #6483: [FLINK-7243][flink-formats] Add 
parquet input format
URL: https://github.com/apache/flink/pull/6483
 
 
   ## What is the purpose of the change
   
   This pull request is to create a ParquetInputFormat, so that Flink can 
process files in Parquet schema. Parquet record can be read as Row, Pojo and 
Generic Map.
   
   ## Brief change log
   
 - *Add schema converter for Parquet types to Flink internal types*
 - *Add ParquetRecordRead to read Parquet record as Row*
 - *Add ParquetInputFormat that can be exteneded to convert Row to Pojo or 
Generic Map*
   
   
   ## Verifying this change
   
   This change is already covered by existing tests, such as 
ParquetRecordReaderTest, ParquetSchemaConverterTest and  ParquetInputFormatTest.
   
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (no)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
 - The serializers: (no)
 - The runtime per-record code paths (performance sensitive): (no)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)
 - The S3 file system connector: (no)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes)
 - If yes, how is the feature documented? (JavaDocs)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add ParquetInputFormat
> --
>
> Key: FLINK-7243
> URL: https://issues.apache.org/jira/browse/FLINK-7243
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API & SQL
>Reporter: godfrey he
>Assignee: Zhenqiu Huang
>Priority: Major
>  Labels: pull-request-available
>
> Add a {{ParquetInputFormat}} to read data from a Apache Parquet file. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zhijiangW opened a new pull request #6891: [FLINK-10606][network][test] Construct NetworkEnvironment simple for tests

2018-10-21 Thread GitBox
zhijiangW opened a new pull request #6891: [FLINK-10606][network][test] 
Construct NetworkEnvironment simple for tests
URL: https://github.com/apache/flink/pull/6891
 
 
   ## What is the purpose of the change
   
   *Implement a simple constructor for `NetworkEnvironment`, which can be 
widely used for existing tests to remove duplicate codes.*
   
   
   ## Brief change log
   
 - *Define a new simple constructor for `NetworkEnvironment`*
 - *Modify the existing tests based on new constructor*
   
   ## Verifying this change
   
   This change is already covered by existing tests.
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (no)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
 - The serializers: (no)
 - The runtime per-record code paths (performance sensitive): (no)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)
 - The S3 file system connector: (no)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (no)
 - If yes, how is the feature documented? (not applicable)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (FLINK-10606) Construct NetworkEnvironment simple for tests

2018-10-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16658611#comment-16658611
 ] 

ASF GitHub Bot commented on FLINK-10606:


zhijiangW opened a new pull request #6891: [FLINK-10606][network][test] 
Construct NetworkEnvironment simple for tests
URL: https://github.com/apache/flink/pull/6891
 
 
   ## What is the purpose of the change
   
   *Implement a simple constructor for `NetworkEnvironment`, which can be 
widely used for existing tests to remove duplicate codes.*
   
   
   ## Brief change log
   
 - *Define a new simple constructor for `NetworkEnvironment`*
 - *Modify the existing tests based on new constructor*
   
   ## Verifying this change
   
   This change is already covered by existing tests.
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (no)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
 - The serializers: (no)
 - The runtime per-record code paths (performance sensitive): (no)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)
 - The S3 file system connector: (no)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (no)
 - If yes, how is the feature documented? (not applicable)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Construct NetworkEnvironment simple for tests
> -
>
> Key: FLINK-10606
> URL: https://issues.apache.org/jira/browse/FLINK-10606
> Project: Flink
>  Issue Type: Test
>  Components: Network, Tests
>Affects Versions: 1.6.1, 1.7.0
>Reporter: zhijiang
>Assignee: zhijiang
>Priority: Minor
>  Labels: pull-request-available
>
> Currently, five tests would create {{NetworkEnvironment}} in different places 
> and the most of parameters are common in constructing it.
> We can provide a simple way for constructing the {{NetworkEnvironment}} with 
> less parameters to deduplicate the common codes, which can bring benefits for 
> future new tests.
> We can define a static nested {{NetworkEnvironmentBuilder}} inside the 
> {{NetworkEnvironment or a simple constructor directly as the option.}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-10606) Construct NetworkEnvironment simple for tests

2018-10-21 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-10606:
---
Labels: pull-request-available  (was: )

> Construct NetworkEnvironment simple for tests
> -
>
> Key: FLINK-10606
> URL: https://issues.apache.org/jira/browse/FLINK-10606
> Project: Flink
>  Issue Type: Test
>  Components: Network, Tests
>Affects Versions: 1.6.1, 1.7.0
>Reporter: zhijiang
>Assignee: zhijiang
>Priority: Minor
>  Labels: pull-request-available
>
> Currently, five tests would create {{NetworkEnvironment}} in different places 
> and the most of parameters are common in constructing it.
> We can provide a simple way for constructing the {{NetworkEnvironment}} with 
> less parameters to deduplicate the common codes, which can bring benefits for 
> future new tests.
> We can define a static nested {{NetworkEnvironmentBuilder}} inside the 
> {{NetworkEnvironment or a simple constructor directly as the option.}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zhijiangW opened a new pull request #6892: [FLINK-10607][network][test] Unify to remove duplicated NoOpResultPartitionConsumableNotifier

2018-10-21 Thread GitBox
zhijiangW opened a new pull request #6892: [FLINK-10607][network][test] Unify 
to remove duplicated NoOpResultPartitionConsumableNotifier
URL: https://github.com/apache/flink/pull/6892
 
 
   ## What is the purpose of the change
   
   *There are currently existing two same 
`NoOpResultPartitionConsumableNotifier`. We should retain only one and unify 
all the usages in related tests for avoiding mock.*
   
   ## Brief change log
   
 - *Keep only one `NoOpResultPartitionConsumableNotifier` for tests*
 - *Modify all the related tests to avoid mock 
`ResultPartitionConsumableNotifier`*
   
   ## Verifying this change
   
   *This change is already covered by existing tests.*
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (no)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
 - The serializers: (no)
 - The runtime per-record code paths (performance sensitive): (no)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)
 - The S3 file system connector: (no)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (no)
 - If yes, how is the feature documented? (not applicable)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (FLINK-10607) Unify to remove duplicated NoOpResultPartitionConsumableNotifier

2018-10-21 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-10607:
---
Labels: pull-request-available  (was: )

> Unify to remove duplicated NoOpResultPartitionConsumableNotifier
> 
>
> Key: FLINK-10607
> URL: https://issues.apache.org/jira/browse/FLINK-10607
> Project: Flink
>  Issue Type: Test
>  Components: Network, Tests
>Affects Versions: 1.6.1, 1.7.0
>Reporter: zhijiang
>Assignee: zhijiang
>Priority: Minor
>  Labels: pull-request-available
>
> Currently there exists two same {{NoOpResultPartitionConsumableNotifier}} 
> implementations for different tests. We can deduplicate the common codes and 
> public it for unified usages. And it will also bring benefits for future new 
> tests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10607) Unify to remove duplicated NoOpResultPartitionConsumableNotifier

2018-10-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16658662#comment-16658662
 ] 

ASF GitHub Bot commented on FLINK-10607:


zhijiangW opened a new pull request #6892: [FLINK-10607][network][test] Unify 
to remove duplicated NoOpResultPartitionConsumableNotifier
URL: https://github.com/apache/flink/pull/6892
 
 
   ## What is the purpose of the change
   
   *There are currently existing two same 
`NoOpResultPartitionConsumableNotifier`. We should retain only one and unify 
all the usages in related tests for avoiding mock.*
   
   ## Brief change log
   
 - *Keep only one `NoOpResultPartitionConsumableNotifier` for tests*
 - *Modify all the related tests to avoid mock 
`ResultPartitionConsumableNotifier`*
   
   ## Verifying this change
   
   *This change is already covered by existing tests.*
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (no)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
 - The serializers: (no)
 - The runtime per-record code paths (performance sensitive): (no)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)
 - The S3 file system connector: (no)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (no)
 - If yes, how is the feature documented? (not applicable)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Unify to remove duplicated NoOpResultPartitionConsumableNotifier
> 
>
> Key: FLINK-10607
> URL: https://issues.apache.org/jira/browse/FLINK-10607
> Project: Flink
>  Issue Type: Test
>  Components: Network, Tests
>Affects Versions: 1.6.1, 1.7.0
>Reporter: zhijiang
>Assignee: zhijiang
>Priority: Minor
>  Labels: pull-request-available
>
> Currently there exists two same {{NoOpResultPartitionConsumableNotifier}} 
> implementations for different tests. We can deduplicate the common codes and 
> public it for unified usages. And it will also bring benefits for future new 
> tests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] tzulitai commented on a change in pull request #6875: [FLINK-9808] [state backends] Migrate state when necessary in state backends

2018-10-21 Thread GitBox
tzulitai commented on a change in pull request #6875: [FLINK-9808] [state 
backends] Migrate state when necessary in state backends
URL: https://github.com/apache/flink/pull/6875#discussion_r226911417
 
 

 ##
 File path: 
flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/AbstractRocksDBState.java
 ##
 @@ -154,6 +159,29 @@ public void setCurrentNamespace(N namespace) {
return backend.db.get(columnFamily, 
tmpKeySerializationView.getCopyOfBuffer());
}
 
+   public byte[] migrateSerializedValue(
+   byte[] serializedOldValue,
+   TypeSerializer priorSerializer,
+   TypeSerializer newSerializer) throws 
StateMigrationException {
+
+   try {
+   V value = priorSerializer.deserialize(new 
DataInputViewStreamWrapper(new ByteArrayInputStream(serializedOldValue)));
+
+   ByteArrayOutputStream baos = new 
ByteArrayOutputStream();
+   DataOutputViewStreamWrapper out = new 
DataOutputViewStreamWrapper(baos);
 
 Review comment:
   Good idea, will address this.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (FLINK-9808) Implement state conversion procedure in state backends

2018-10-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16658664#comment-16658664
 ] 

ASF GitHub Bot commented on FLINK-9808:
---

tzulitai commented on a change in pull request #6875: [FLINK-9808] [state 
backends] Migrate state when necessary in state backends
URL: https://github.com/apache/flink/pull/6875#discussion_r226911417
 
 

 ##
 File path: 
flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/AbstractRocksDBState.java
 ##
 @@ -154,6 +159,29 @@ public void setCurrentNamespace(N namespace) {
return backend.db.get(columnFamily, 
tmpKeySerializationView.getCopyOfBuffer());
}
 
+   public byte[] migrateSerializedValue(
+   byte[] serializedOldValue,
+   TypeSerializer priorSerializer,
+   TypeSerializer newSerializer) throws 
StateMigrationException {
+
+   try {
+   V value = priorSerializer.deserialize(new 
DataInputViewStreamWrapper(new ByteArrayInputStream(serializedOldValue)));
+
+   ByteArrayOutputStream baos = new 
ByteArrayOutputStream();
+   DataOutputViewStreamWrapper out = new 
DataOutputViewStreamWrapper(baos);
 
 Review comment:
   Good idea, will address this.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Implement state conversion procedure in state backends
> --
>
> Key: FLINK-9808
> URL: https://issues.apache.org/jira/browse/FLINK-9808
> Project: Flink
>  Issue Type: Sub-task
>  Components: State Backends, Checkpointing
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Aljoscha Krettek
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.7.0
>
>
> With FLINK-9377 in place and that config snapshots serve as the single source 
> of truth for recreating restore serializers, the next step would be to 
> utilize this when performing a full-pass state conversion (i.e., read with 
> old / restore serializer, write with new serializer).
> For Flink's heap-based backends, it can be seen that state conversion 
> inherently happens, since all state is always deserialized after restore with 
> the restore serializer, and written with the new serializer on snapshots.
> For the RocksDB state backend, since state is lazily deserialized, state 
> conversion needs to happen for per-registered state on their first access if 
> the registered new serializer has a different serialization schema than the 
> previous serializer.
> This task should consist of three parts:
> 1. Allow {{CompatibilityResult}} to correctly distinguish between whether the 
> new serializer's schema is a) compatible with the serializer as it is, b) 
> compatible after the serializer has been reconfigured, or c) incompatible.
> 2. Introduce state conversion procedures in the RocksDB state backend. This 
> should occur on the first state access.
> 3. Make sure that all other backends no longer do redundant serializer 
> compatibility checks. That is not required because those backends always 
> perform full-pass state conversions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10634) End-to-end test: Metrics accessible via REST API

2018-10-21 Thread Chesnay Schepler (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16658666#comment-16658666
 ] 

Chesnay Schepler commented on FLINK-10634:
--

I don't believe this test to be necessary as several E2E tests already use the 
metric system (both via reporters and the REST API).

> End-to-end test: Metrics accessible via REST API
> 
>
> Key: FLINK-10634
> URL: https://issues.apache.org/jira/browse/FLINK-10634
> Project: Flink
>  Issue Type: Sub-task
>  Components: E2E Tests, Metrics, REST
>Affects Versions: 1.7.0
>Reporter: Till Rohrmann
>Priority: Major
> Fix For: 1.7.0
>
>
> Verify that Flink's metrics can be accessed via the REST API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] tzulitai commented on a change in pull request #6875: [FLINK-9808] [state backends] Migrate state when necessary in state backends

2018-10-21 Thread GitBox
tzulitai commented on a change in pull request #6875: [FLINK-9808] [state 
backends] Migrate state when necessary in state backends
URL: https://github.com/apache/flink/pull/6875#discussion_r226912089
 
 

 ##
 File path: 
flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBKeyedStateBackend.java
 ##
 @@ -1389,6 +1388,149 @@ private void copyStateDataHandleData(
return Tuple2.of(stateInfo.f0, newMetaInfo);
}
 
+   private  
RegisteredKeyValueStateBackendMetaInfo migrateStateIfNecessary(
+   StateDescriptor stateDesc,
+   TypeSerializer namespaceSerializer,
+   Tuple2 
stateInfo,
+   @Nullable StateSnapshotTransformer 
snapshotTransformer) throws Exception {
+
+   StateMetaInfoSnapshot restoredMetaInfoSnapshot = 
restoredKvStateMetaInfos.get(stateDesc.getName());
+
+   Preconditions.checkState(
+   restoredMetaInfoSnapshot != null,
+   "Requested to check compatibility of a restored 
RegisteredKeyedBackendStateMetaInfo," +
+   " but its corresponding restored snapshot 
cannot be found.");
+
+   
Preconditions.checkState(restoredMetaInfoSnapshot.getBackendStateType()
+   == 
StateMetaInfoSnapshot.BackendStateType.KEY_VALUE,
+   "Incompatible state types. " +
+   "Was [" + 
restoredMetaInfoSnapshot.getBackendStateType() + "], " +
+   "registered as [" + 
StateMetaInfoSnapshot.BackendStateType.KEY_VALUE + "].");
+
+   Preconditions.checkState(
+   Objects.equals(stateDesc.getName(), 
restoredMetaInfoSnapshot.getName()),
+   "Incompatible state names. " +
+   "Was [" + restoredMetaInfoSnapshot.getName() + 
"], " +
+   "registered with [" + stateDesc.getName() + 
"].");
+
+   final StateDescriptor.Type restoredType =
+   StateDescriptor.Type.valueOf(
+   restoredMetaInfoSnapshot.getOption(
+   
StateMetaInfoSnapshot.CommonOptionsKeys.KEYED_STATE_TYPE));
+
+   if (!Objects.equals(stateDesc.getType(), 
StateDescriptor.Type.UNKNOWN)
+   && !Objects.equals(restoredType, 
StateDescriptor.Type.UNKNOWN)) {
+
+   Preconditions.checkState(
+   stateDesc.getType() == restoredType,
+   "Incompatible key/value state types. " +
+   "Was [" + restoredType + "], " +
+   "registered with [" + 
stateDesc.getType() + "].");
+   }
+
+   TypeSerializer stateSerializer = stateDesc.getSerializer();
+
+   RegisteredKeyValueStateBackendMetaInfo newMetaInfo = new 
RegisteredKeyValueStateBackendMetaInfo<>(
+   stateDesc.getType(),
+   stateDesc.getName(),
+   namespaceSerializer,
+   stateSerializer,
+   snapshotTransformer);
+
+   CompatibilityResult namespaceCompatibility = 
CompatibilityUtil.resolveCompatibilityResult(
 
 Review comment:
   Yes, this indeed is not nice. We actually have a newer replacement class 
called `TypeSerializerSchemaCompatibility`, but at the moment the use of 
`CompatibilityResult` can't be completely avoided yet.
   
   I have been discussing with @dawidwys that we maybe should just remove 
`CompatibilityResult` completely in favor of the newer replacement, but this 
might come up as a separate commit / PR.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] tzulitai commented on a change in pull request #6875: [FLINK-9808] [state backends] Migrate state when necessary in state backends

2018-10-21 Thread GitBox
tzulitai commented on a change in pull request #6875: [FLINK-9808] [state 
backends] Migrate state when necessary in state backends
URL: https://github.com/apache/flink/pull/6875#discussion_r226912034
 
 

 ##
 File path: 
flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBKeyedStateBackend.java
 ##
 @@ -1389,6 +1388,149 @@ private void copyStateDataHandleData(
return Tuple2.of(stateInfo.f0, newMetaInfo);
}
 
+   private  
RegisteredKeyValueStateBackendMetaInfo migrateStateIfNecessary(
+   StateDescriptor stateDesc,
+   TypeSerializer namespaceSerializer,
+   Tuple2 
stateInfo,
+   @Nullable StateSnapshotTransformer 
snapshotTransformer) throws Exception {
+
+   StateMetaInfoSnapshot restoredMetaInfoSnapshot = 
restoredKvStateMetaInfos.get(stateDesc.getName());
+
+   Preconditions.checkState(
+   restoredMetaInfoSnapshot != null,
+   "Requested to check compatibility of a restored 
RegisteredKeyedBackendStateMetaInfo," +
+   " but its corresponding restored snapshot 
cannot be found.");
+
+   
Preconditions.checkState(restoredMetaInfoSnapshot.getBackendStateType()
+   == 
StateMetaInfoSnapshot.BackendStateType.KEY_VALUE,
+   "Incompatible state types. " +
+   "Was [" + 
restoredMetaInfoSnapshot.getBackendStateType() + "], " +
+   "registered as [" + 
StateMetaInfoSnapshot.BackendStateType.KEY_VALUE + "].");
+
+   Preconditions.checkState(
+   Objects.equals(stateDesc.getName(), 
restoredMetaInfoSnapshot.getName()),
+   "Incompatible state names. " +
+   "Was [" + restoredMetaInfoSnapshot.getName() + 
"], " +
+   "registered with [" + stateDesc.getName() + 
"].");
+
+   final StateDescriptor.Type restoredType =
+   StateDescriptor.Type.valueOf(
+   restoredMetaInfoSnapshot.getOption(
+   
StateMetaInfoSnapshot.CommonOptionsKeys.KEYED_STATE_TYPE));
+
+   if (!Objects.equals(stateDesc.getType(), 
StateDescriptor.Type.UNKNOWN)
+   && !Objects.equals(restoredType, 
StateDescriptor.Type.UNKNOWN)) {
+
+   Preconditions.checkState(
+   stateDesc.getType() == restoredType,
+   "Incompatible key/value state types. " +
+   "Was [" + restoredType + "], " +
+   "registered with [" + 
stateDesc.getType() + "].");
+   }
+
+   TypeSerializer stateSerializer = stateDesc.getSerializer();
+
+   RegisteredKeyValueStateBackendMetaInfo newMetaInfo = new 
RegisteredKeyValueStateBackendMetaInfo<>(
+   stateDesc.getType(),
+   stateDesc.getName(),
+   namespaceSerializer,
+   stateSerializer,
+   snapshotTransformer);
+
+   CompatibilityResult namespaceCompatibility = 
CompatibilityUtil.resolveCompatibilityResult(
 
 Review comment:
   Yes, this indeed is not nice. We actually have a newer replacement class 
called `TypeSerializerSchemaCompatibility`, but at the moment the use of 
`CompatibilityResult` can't be completely avoided yet.
   
   I have been discussing with @dawidwys that we maybe should just remove 
`CompatibilityResult` completely in favor of the newer replacement, but this 
might come up as a separate commit / PR.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


  1   2   >