date:20181213


Specifically which connector are you using, and which Flink version?

On 12.12.2018 13:31, Vijay Bhaskar wrote:

Hi
We are using flink elastic sink which streams at the rate of 1000 
events/sec, as described in 
https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/elasticsearch.html.
We are observing connection leak of elastic connections. After few 
minutes all the open connections are exceeding the process limits of 
the max open descriptors and Job is getting terminated. But the  http 
connections with the elastic search server remain open forever. Am i 
missing any specific configuration setting to close the open 
connection, after serving the request?
But there is no such setting is described in the above documentation 
of elastic sink


Regards
Bhaskar

Re: Connection leak with flink elastic Sink

2018-12-13 Thread Tzu-Li (Gordon) Tai

Hi,

Besides the information that Chesnay requested, could you also provide a stack 
trace of the exception that caused the job to terminate in the first place?

The Elasticsearch sink does indeed close the internally used Elasticsearch 
client, which should in turn properly release all resources [1].
I would like to double check whether or not the case here is that that part of 
the code was never reached.

Cheers,
Gordon

[1] 
https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-elasticsearch-base/src/main/java/org/apache/flink/streaming/connectors/elasticsearch/ElasticsearchSinkBase.java#L334

On 13 December 2018 at 5:59:34 PM, Chesnay Schepler (ches...@apache.org) wrote:

Specifically which connector are you using, and which Flink version?  

On 12.12.2018 13:31, Vijay Bhaskar wrote:  
> Hi  
> We are using flink elastic sink which streams at the rate of 1000  
> events/sec, as described in  
> https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/elasticsearch.html.
>   
> We are observing connection leak of elastic connections. After few  
> minutes all the open connections are exceeding the process limits of  
> the max open descriptors and Job is getting terminated. But the http  
> connections with the elastic search server remain open forever. Am i  
> missing any specific configuration setting to close the open  
> connection, after serving the request?  
> But there is no such setting is described in the above documentation  
> of elastic sink  
>  
> Regards  
> Bhaskar

Re: [DISCUSS] Python (and Non-JVM) Language Support in Flink

2018-12-13 Thread Shaoxuan Wang

RE: Stephen's options (
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/SURVEY-Usage-of-flink-python-and-flink-streaming-python-td25793.html
)
* Option (1): Language portability via Apache Beam
* Option (2): Implement own Python API
* Option (3): Implement own portability layer

Hi Stephen,
Eventually, I think we should support both option1 and option3. TMO, these
two options are orthogonal. I agree with you that we can leverage the
existing work and ecosystem in beam by supporting option1. But the problem
of beam is that it skips (to the best of my knowledge) the natural
table/SQL optimization framework provided by Flink. We should spend all the
needed efforts to support solution1 (as it is the better alternative of the
current Flink python API), but cannot solely bet on it. Option3 is the
ideal choice for Flink to support all Non-JVM languages which we should
better plan to achieve. We have done some preliminary prototypes for
option2/option3, and it seems not quite complex and difficult to accomplish.

Regards,
Shaoxuan


On Thu, Dec 13, 2018 at 4:58 PM Xianda Ke  wrote:

> Currently there is an ongoing survey about Python usage of Flink [1]. Some
> discussion was also brought up there regarding non-jvm language support
> strategy in general. To avoid polluting the survey thread, we are starting
> this discussion thread and would like to move the discussions here.
>
> In the interest of facilitating the discussion, we would like to first
> share the following design doc which describes what we have done at Alibaba
> about Python API for Flink. It could serve as a good reference to the
> discussion.
>
>  [DISCUSS] Flink Python API
> <
> https://docs.google.com/document/d/1JNGWdLwbo_btq9RVrc1PjWJV3lYUgPvK0uEWDIfVNJI/edit?usp=drive_web
> >
>
> As of now, we've implemented and delivered Python UDF for SQL for the
> internal users at Alibaba.
> We are starting to implement Python API.
>
> To recap and continue the discussion from the survey thread, I agree with
> @Stephan that we should figure out in which general direction Python
> support should go. Stephan also list three options there:
> * Option (1): Language portability via Apache Beam
> * Option (2): Implement own Python API
> * Option (3): Implement own portability layer
>
> From my perspective,
> (1). Flink language APIs and Beam's languages support are not mutually
> exclusive.
> It is nice that Beam has Python/NodeJS/Go APIs, and support Flink as the
> runner.
> Flink's own Python(or NodeJS/Go) APIs will benefit Flink's ecosystem.
>
> (2). Python API / portability layer
> To support non-JVM languages in Flink,
>  * at client side, Flink would provide language interfaces, which will
> translate user's application to Flink StreamGraph.
> * at server side, Flink would execute user's UDF code at runtime
> The non-JVM languages communicate with JVM via RPC(or low-level socket,
> embedded interpreter and so on). What the portability layer can do maybe is
> abstracting the RPC layer. When the portability layer is ready, still there
> are lots of stuff to do for a specified language. Say, Python, we may still
> have to write the interface classes by hand for the users because generated
> code without detailed documentation is unacceptable for users, or handle
> the serialization issue of lambda/closure which is not a built-in feature
> in Python.  Maybe, we can start with Python API, then extend to other
> languages and abstract the logic in common as the portability layer.
>
> ---
> References:
> [1] [SURVEY] Usage of flink-python and flink-streaming-python
>
> Regards,
> Xianda
>

[jira] [Created] (FLINK-11149) Flink will request too more containers than it actually needs

2018-12-13 Thread Fan Xinpu (JIRA)

Fan Xinpu created FLINK-11149:
-

 Summary: Flink will request too more containers than it actually 
needs
 Key: FLINK-11149
 URL: https://issues.apache.org/jira/browse/FLINK-11149
 Project: Flink
  Issue Type: Improvement
  Components: YARN
Affects Versions: 1.7.0
Reporter: Fan Xinpu


  As known, flink will request new containers when it was notified that some 
allocated container is completed. Let me say, maybe one container failed, and 
Flink tries to request one container from NM, but actually Flink will request 
n+1 containers, the n refers to the number that ever requested after cluster is 
created.It is not graceful.

  When requesting a container, Flink will send a ContainerRequest to RM through 
AMRM Client, and AMRMClient will save the ContainerRequest in itself, and hopes 
the ContainerRequest will be removed in future, but Flink never removes the 
ContainerRequest, so one by one, the number of ContainerRequest accumulates to 
a unexpected value.

  In our environment, a cluster initially allocated 100 containers, and later 
on，it requests one container from RM, RM returns more than 2000 containers to 
it as the request actually has more than 2000 ContainerRequest. Although Flink 
will return the excess containers, this request behavior waste time and 
resource on yarn.

  So, maybe Flink can remove the ContainerRequest after the request has been 
sent to RM, then Flink will get exactly numbers of containers as it explicitly 
did.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: Connection leak with flink elastic Sink

2018-12-13 Thread Vijay Bhaskar

Hi Gordon,
We are using flink cluster 1.6.1, elastic search connector version:
flink-connector-elasticsearch6_2.11
Attached the stack trace.

Following are the max open file descriptor limit of theTask manager
process and open connections to the elastic
search cluster

Regards
Bhaskar
*#lsof -p 62041 | wc -l*

*65583*

*All the connections to elastic cluster reached to:*

*netstat -aln | grep 9200 | wc -l*

*2333*




On Thu, Dec 13, 2018 at 4:12 PM Tzu-Li (Gordon) Tai 
wrote:

> Hi,
>
> Besides the information that Chesnay requested, could you also provide a
> stack trace of the exception that caused the job to terminate in the first
> place?
>
> The Elasticsearch sink does indeed close the internally used Elasticsearch
> client, which should in turn properly release all resources [1].
> I would like to double check whether or not the case here is that that
> part of the code was never reached.
>
> Cheers,
> Gordon
>
> [1]
> https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-elasticsearch-base/src/main/java/org/apache/flink/streaming/connectors/elasticsearch/ElasticsearchSinkBase.java#L334
>
> On 13 December 2018 at 5:59:34 PM, Chesnay Schepler (ches...@apache.org)
> wrote:
>
> Specifically which connector are you using, and which Flink version?
>
> On 12.12.2018 13:31, Vijay Bhaskar wrote:
> > Hi
> > We are using flink elastic sink which streams at the rate of 1000
> > events/sec, as described in
> >
> https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/elasticsearch.html.
>
> > We are observing connection leak of elastic connections. After few
> > minutes all the open connections are exceeding the process limits of
> > the max open descriptors and Job is getting terminated. But the http
> > connections with the elastic search server remain open forever. Am i
> > missing any specific configuration setting to close the open
> > connection, after serving the request?
> > But there is no such setting is described in the above documentation
> > of elastic sink
> >
> > Regards
> > Bhaskar
>
>
>
2018-12-13 06:14:42,120 INFO  
org.apache.flink.runtime.entrypoint.ClusterEntrypoint - 

2018-12-13 06:14:42,122 INFO  
org.apache.flink.runtime.entrypoint.ClusterEntrypoint -  Starting 
StandaloneSessionClusterEntrypoint (Version: 1.6.1, Rev:23e2636, 
Date:14.09.2018 @ 19:56:46 UTC)
2018-12-13 06:14:42,122 INFO  
org.apache.flink.runtime.entrypoint.ClusterEntrypoint -  OS current 
user: root
2018-12-13 06:14:42,122 INFO  
org.apache.flink.runtime.entrypoint.ClusterEntrypoint -  Current 
Hadoop/Kerberos user: 
2018-12-13 06:14:42,122 INFO  
org.apache.flink.runtime.entrypoint.ClusterEntrypoint -  JVM: OpenJDK 
64-Bit Server VM - Oracle Corporation - 1.8/25.181-b13
2018-12-13 06:14:42,122 INFO  
org.apache.flink.runtime.entrypoint.ClusterEntrypoint -  Maximum heap 
size: 981 MiBytes
2018-12-13 06:14:42,123 INFO  
org.apache.flink.runtime.entrypoint.ClusterEntrypoint -  JAVA_HOME: 
(not set)
2018-12-13 06:14:42,123 INFO  
org.apache.flink.runtime.entrypoint.ClusterEntrypoint -  No Hadoop 
Dependency available
2018-12-13 06:14:42,123 INFO  
org.apache.flink.runtime.entrypoint.ClusterEntrypoint -  JVM Options:
2018-12-13 06:14:42,123 INFO  
org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Xms1024m
2018-12-13 06:14:42,123 INFO  
org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Xmx1024m
2018-12-13 06:14:42,123 INFO  
org.apache.flink.runtime.entrypoint.ClusterEntrypoint - 
-Dlog.file=/root/flink-1.6.1/log/flink-root-standalonesession-0-contrail-eng-raisa-cloudsecure-eng-raisa-flink.log
2018-12-13 06:14:42,123 INFO  
org.apache.flink.runtime.entrypoint.ClusterEntrypoint - 
-Dlog4j.configuration=file:/root/flink-1.6.1/conf/log4j.properties
2018-12-13 06:14:42,124 INFO  
org.apache.flink.runtime.entrypoint.ClusterEntrypoint - 
-Dlogback.configurationFile=file:/root/flink-1.6.1/conf/logback.xml
2018-12-13 06:14:42,124 INFO  
org.apache.flink.runtime.entrypoint.ClusterEntrypoint -  Program 
Arguments:
2018-12-13 06:14:42,124 INFO  
org.apache.flink.runtime.entrypoint.ClusterEntrypoint - --configDir
2018-12-13 06:14:42,124 INFO  
org.apache.flink.runtime.entrypoint.ClusterEntrypoint - 
/root/flink-1.6.1/conf
2018-12-13 06:14:42,124 INFO  
org.apache.flink.runtime.entrypoint.ClusterEntrypoint - 
--executionMode
2018-12-13 06:14:42,124 INFO  
org.apache.flink.runtime.entrypoint.ClusterEntrypoint - cluster
2018-12-13 06:14:42,124 INFO  
org.apache.flink.runtime.entrypoint.ClusterEntrypoint -  Classpath: 
/root/flink-1.6.1/lib/flink-python_2.11-1.6.1.jar:/root/flink-1.6.1/lib/log4j-1.2.17.jar:/root/flink-1.6.1/lib/log4j-api-2.9.1.jar:/root/flink-1.6.1/lib/slf4j-log4j12-1.7.7.jar:/root/flink-1.6.1/lib/flin

[jira] [Created] (FLINK-11150) Check exception messages in ValidationTest of flink-table

2018-12-13 Thread Hequn Cheng (JIRA)

Hequn Cheng created FLINK-11150:
---

 Summary: Check exception messages in ValidationTest of flink-table
 Key: FLINK-11150
 URL: https://issues.apache.org/jira/browse/FLINK-11150
 Project: Flink
  Issue Type: Improvement
  Components: Table API & SQL
Reporter: Hequn Cheng
Assignee: Hequn Cheng


Problem

Currently, there are a lot of {{ValidationTests}} in flink-table. These tests 
are used to test whether exceptions are thrown correctly. However, the detailed 
messages of the exception have not been checked which makes the tests very 
fragile. Take the following test as an example:
{code:java}
class TableSinkValidationTest extends TableTestBase {

  @Test(expected = classOf[TableException])
  def testAppendSinkOnUpdatingTable(): Unit = {
val env = StreamExecutionEnvironment.getExecutionEnvironment
val tEnv = TableEnvironment.getTableEnvironment(env)

val t = StreamTestData.get3TupleDataStream(env).toTable(tEnv, 'id, 'num, 
'text)
tEnv.registerTableSink("testSink", new TestAppendSink)

t.groupBy('text)
.select('text, 'id.count, 'num.sum)
.insertInto("testSink")

// must fail because table is not append-only
env.execute()
  }
}
{code}
The test is used to check validation for AppendSink on updating table. The test 
will pass without any exceptions. If we remove the {{(expected = 
classOf[TableException])}}, we can see the following exception:
{code:java}
org.apache.flink.table.api.TableException: Table sink is not configured.
{code}
However, the correct exception should be:
{code:java}
org.apache.flink.table.api.TableException: AppendStreamTableSink requires that 
Table has only insert changes.
{code}
Since the two exceptions share the same exception class name, we also have to 
check the exception messages.

 

Proposal

To make our test more rigorous, I think it is better to use 
{{ExpectedException}} to check both the exception class and exception messages. 
So the previous test would be:
{code:java}
@Test
  def testAppendSinkOnUpdatingTable(): Unit = {
expectedException.expect(classOf[TableException])
expectedException.expectMessage("AppendStreamTableSink requires that Table 
has only insert changes.")

val env = StreamExecutionEnvironment.getExecutionEnvironment
val tEnv = TableEnvironment.getTableEnvironment(env)

val t = StreamTestData.get3TupleDataStream(env).toTable(tEnv, 'id, 'num, 
'text)
tEnv.registerTableSink("testSink", new TestAppendSink()
  .configure(Array("text", "id", "num"), Array(Types.STRING, Types.LONG, 
Types.LONG)))

t.groupBy('text)
.select('text, 'id.count, 'num.sum)
.insertInto("testSink")

// must fail because table is not append-only
env.execute()
  }
{code}
which adds two more lines to the test:
{code:java}
expectedException.expect(classOf[TableException])
expectedException.expectMessage("AppendStreamTableSink requires that Table 
has only insert changes.")
{code}
 

Any suggestions are greatly appreciated!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-11151) FileUploadHandler stops working if the upload directory is removed

2018-12-13 Thread Chesnay Schepler (JIRA)

Chesnay Schepler created FLINK-11151:


 Summary: FileUploadHandler stops working if the upload directory 
is removed
 Key: FLINK-11151
 URL: https://issues.apache.org/jira/browse/FLINK-11151
 Project: Flink
  Issue Type: Bug
  Components: Job-Submission, REST
Affects Versions: 1.7.0, 1.6.2
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.6.3, 1.8.0, 1.7.2


A user has reported on the ML that the FileUploadHandler does not accept any 
files anymore if the upload directory was deleted after the cluster has been 
started.
A cursory glance at the code shows that it currently uses 
{{Files.createDirectory(...)}} to create a temporary directory for the current 
request to store uploaded files in.
Changing this to use {{Files.createDirectories(...)}} instead should prevent 
this from happening again.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-11152) ClosureCleanerTest fails with IllegalStateException

Gary Yao created FLINK-11152:


 Summary: ClosureCleanerTest fails with IllegalStateException
 Key: FLINK-11152
 URL: https://issues.apache.org/jira/browse/FLINK-11152
 Project: Flink
  Issue Type: Sub-task
  Components: Java API, Tests
Affects Versions: 1.8.0
Reporter: Gary Yao
Assignee: Gary Yao
 Fix For: 1.8.0


{{ClosureCleanerTest}} fails with the exception below:

{noformat}
java.lang.IllegalArgumentException
at 
org.apache.flink.shaded.asm5.org.objectweb.asm.ClassReader.(Unknown 
Source)
at 
org.apache.flink.shaded.asm5.org.objectweb.asm.ClassReader.(Unknown 
Source)
at 
org.apache.flink.shaded.asm5.org.objectweb.asm.ClassReader.(Unknown 
Source)
at 
org.apache.flink.api.java.ClosureCleaner.getClassReader(ClosureCleaner.java:148)
at 
org.apache.flink.api.java.ClosureCleaner.cleanThis0(ClosureCleaner.java:115)
at 
org.apache.flink.api.java.ClosureCleaner.clean(ClosureCleaner.java:75)
at 
org.apache.flink.api.java.functions.ClosureCleanerTest.testCleanedNonSerializable(ClosureCleanerTest.java:51)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:564)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-11153) UdfAnalyzerTest fails with CodeAnalyzerException

Gary Yao created FLINK-11153:


 Summary: UdfAnalyzerTest fails with CodeAnalyzerException
 Key: FLINK-11153
 URL: https://issues.apache.org/jira/browse/FLINK-11153
 Project: Flink
  Issue Type: Sub-task
  Components: Java API, Tests
Affects Versions: 1.8.0
Reporter: Gary Yao
 Fix For: 1.8.0


{noformat}
org.apache.flink.api.java.sca.CodeAnalyzerException: Exception occurred during 
code analysis.

at 
org.apache.flink.api.java.sca.UdfAnalyzer.analyze(UdfAnalyzer.java:341)
at 
org.apache.flink.api.java.sca.UdfAnalyzerTest.compareAnalyzerResultWithAnnotationsSingleInputWithKeys(UdfAnalyzerTest.java:1339)
at 
org.apache.flink.api.java.sca.UdfAnalyzerTest.compareAnalyzerResultWithAnnotationsSingleInput(UdfAnalyzerTest.java:1322)
at 
org.apache.flink.api.java.sca.UdfAnalyzerTest.testForwardWithArrayModification(UdfAnalyzerTest.java:695)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:564)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
at 
com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
at 
com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
at 
com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
at 
com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
Caused by: java.lang.IllegalArgumentException
at 
org.apache.flink.shaded.asm5.org.objectweb.asm.ClassReader.(Unknown 
Source)
at 
org.apache.flink.shaded.asm5.org.objectweb.asm.ClassReader.(Unknown 
Source)
at 
org.apache.flink.shaded.asm5.org.objectweb.asm.ClassReader.(Unknown 
Source)
at 
org.apache.flink.api.java.sca.UdfAnalyzerUtils.findMethodNode(UdfAnalyzerUtils.java:131)
at 
org.apache.flink.api.java.sca.UdfAnalyzerUtils.findMethodNode(UdfAnalyzerUtils.java:115)
at 
org.apache.flink.api.java.sca.UdfAnalyzer.analyze(UdfAnalyzer.java:290)
... 25 more
{noformat}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-11154) Upgrade Netty to 4.1.32

Nico Kruber created FLINK-11154:
---

 Summary: Upgrade Netty to 4.1.32
 Key: FLINK-11154
 URL: https://issues.apache.org/jira/browse/FLINK-11154
 Project: Flink
  Issue Type: Improvement
  Components: Network
Affects Versions: 1.8.0
Reporter: Nico Kruber


Notable changes since 4.1.24 (currently used in Flink 1.6-1.7):
- big (performance, feature set) improvements for using openSSL based SSL 
engine (useful for FLINK-9816)
- allow multiple shaded versions of the same netty artifact (as long as the 
shaded prefix is different)
- Ensure ByteToMessageDecoder.Cumulator implementations always release.
- Don't re-arm timerfd each epoll_wait
- Use a non-volatile read for ensureAccessible() whenever possible to reduce 
overhead and allow better inlining.
- Do not fail on runtime when an older version of Log4J2 is on the classpath
- Fix leak and corruption bugs in CompositeByteBuf
- Add support for TLSv1.3
- Harden ref-counting concurrency semantics
- bug fixes
- Java 9-12 related fixes



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-11155) HadoopFreeFsFactoryTest fails with ClassCastException

Gary Yao created FLINK-11155:


 Summary: HadoopFreeFsFactoryTest fails with ClassCastException
 Key: FLINK-11155
 URL: https://issues.apache.org/jira/browse/FLINK-11155
 Project: Flink
  Issue Type: Sub-task
Reporter: Gary Yao


{{HadoopFreeFsFactoryTest#testHadoopFactoryInstantiationWithoutHadoop}} fails 
with a ClassCastException because we try to cast the result of 
{{getClassLoader}} to {{URLClassLoader}} (also see 
https://stackoverflow.com/questions/46694600/java-9-compatability-issue-with-classloader-getsystemclassloader)
{noformat}
java.lang.ClassCastException: 
java.base/jdk.internal.loader.ClassLoaders$AppClassLoader cannot be cast to 
java.base/java.net.URLClassLoader

at 
org.apache.flink.runtime.fs.hdfs.HadoopFreeFsFactoryTest.testHadoopFactoryInstantiationWithoutHadoop(HadoopFreeFsFactoryTest.java:46)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:564)
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-11156) ZooKeeperCompletedCheckpointStoreMockitoTest fails with IncompatibleClassChangeError

Gary Yao created FLINK-11156:


 Summary: ZooKeeperCompletedCheckpointStoreMockitoTest fails with 
IncompatibleClassChangeError
 Key: FLINK-11156
 URL: https://issues.apache.org/jira/browse/FLINK-11156
 Project: Flink
  Issue Type: Sub-task
  Components: Tests
Affects Versions: 1.8.0
Reporter: Gary Yao
 Fix For: 1.8.0



{noformat}
java.lang.IncompatibleClassChangeError: Method 
java.util.Comparator.comparing(Ljava/util/function/Function;)Ljava/util/Comparator;
 must be InterfaceMethodref constant

at 
org.apache.flink.runtime.checkpoint.ZooKeeperCompletedCheckpointStore.(ZooKeeperCompletedCheckpointStore.java:74)
at java.base/java.lang.Class.forName0(Native Method)
at java.base/java.lang.Class.forName(Class.java:292)
at javassist.runtime.Desc.getClassObject(Desc.java:43)
at javassist.runtime.Desc.getClassType(Desc.java:152)
at javassist.runtime.Desc.getType(Desc.java:122)
at javassist.runtime.Desc.getType(Desc.java:78)
at 
org.apache.flink.runtime.checkpoint.ZooKeeperCompletedCheckpointStoreMockitoTest.testCheckpointRecovery(ZooKeeperCompletedCheckpointStoreMockitoTest.java:175)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:564)
at org.junit.internal.runners.TestMethod.invoke(TestMethod.java:68)
at 
org.powermock.modules.junit4.internal.impl.PowerMockJUnit44RunnerDelegateImpl$PowerMockJUnit44MethodRunner.runTestMethod(PowerMockJUnit44RunnerDelegateImpl.java:326)
at 
org.junit.internal.runners.MethodRoadie$1$1.call(MethodRoadie.java:64)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167)
at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641)
at java.base/java.lang.Thread.run(Thread.java:844)
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-11157) Web UI should show timestamps relative to server time zone

Nico Kruber created FLINK-11157:
---

 Summary: Web UI should show timestamps relative to server time zone
 Key: FLINK-11157
 URL: https://issues.apache.org/jira/browse/FLINK-11157
 Project: Flink
  Issue Type: Improvement
  Components: Webfrontend
Affects Versions: 1.7.0
Reporter: Nico Kruber


Currently, it seems the web UI is fetching timestamps, e.g. for checkpoint, as 
milliseconds from Epoch and simply converts them using the Browser's clock and 
time zone. It should be based on the server's time zone instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-11158) Do not show empty page for unknown jobs

Nico Kruber created FLINK-11158:
---

 Summary: Do not show empty page for unknown jobs
 Key: FLINK-11158
 URL: https://issues.apache.org/jira/browse/FLINK-11158
 Project: Flink
  Issue Type: Improvement
  Components: Webfrontend
Affects Versions: 1.7.0, 1.6.2, 1.5.5
Reporter: Nico Kruber


If you try to access the Web UI using an old/non-existing job ID, e.g. after a 
cluster restart, you are currently presented with a white page with no further 
info. It should at least contain "job not found" message to indicate that there 
was no other error.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: [DISCUSS] Creating last bug fix release for 1.5 branch

2018-12-13 Thread Thomas Weise

Hi,

I would be interested to try my hand at being the release manger for this.

There are currently still 5 in-progress issues [1], all except [2] with an
open PR.

Nico, Chesnay, Till: Can you please take a look and see if these can be
completed?

Thanks,
Thomas


[1]
https://issues.apache.org/jira/issues/?jql=statusCategory%20%3D%20indeterminate%20AND%20project%20%3D%2012315522%20AND%20fixVersion%20%3D%2012344315%20ORDER%20BY%20priority%20DESC%2C%20key%20ASC
[2] https://issues.apache.org/jira/browse/FLINK-9010






On Mon, Dec 10, 2018 at 3:15 PM Thomas Weise  wrote:

> Thanks Till and my belated +1 for a final patch release :)
>
> On Mon, Dec 10, 2018 at 5:47 AM Till Rohrmann 
> wrote:
>
>> Thanks for the feedback! I conclude that the community is in favour of a
>> last 1.5.6 release. I'll try to make the arrangements in the next two
>> weeks.
>>
>> Cheers,
>> Till
>>
>> On Mon, Dec 10, 2018 at 2:40 AM jincheng sun 
>> wrote:
>>
>> > +1. There are incompatible improvements between 1.5.x and 1.6/1.7, so
>> many
>> > 1.5.x users may not be willing to upgrade to 1.6 or 1.7 due to migration
>> > costs, so it makes sense to creating last bug fix release for 1.5
>> branch.
>> >
>> > Bests,
>> > Jincheng
>> >
>> > Jeff Zhang  于2018年12月10日周一 上午9:24写道：
>> >
>> > > +1, I think very few people would use 1.6 or 1.7 in their production
>> in
>> > > near future, so I expect they would use 1.5 in production for a long
>> > > period，it makes sense to provide a stable version for production
>> usage.
>> > >
>> > > Ufuk Celebi  于2018年12月9日周日 下午6:07写道：
>> > >
>> > > > +1. This seems reasonable to me. Since the fixes are already in and
>> > > > also part of other releases, the release overhead should be
>> > > > manageable.
>> > > >
>> > > > @Vino: I agree with your assessment.
>> > > >
>> > > > @Qi: As Till mentioned, the official project guideline is to support
>> > > > the last two minor releases, e.g. currently 1.7 and 1.6.
>> > > >
>> > > > Best,
>> > > >
>> > > > Ufuk
>> > > >
>> > > > On Sun, Dec 9, 2018 at 3:48 AM qi luo  wrote:
>> > > > >
>> > > > > Hi Till,
>> > > > >
>> > > > > Does Flink has an agreement on how long will a major version be
>> > > > supported? Some companies may need a long time to upgrade Flink
>> major
>> > > > versions in production. If Flink terminates support for a major
>> version
>> > > too
>> > > > quickly, it may be a concern for companies.
>> > > > >
>> > > > > Best,
>> > > > > Qi
>> > > > >
>> > > > > > On Dec 8, 2018, at 10:57 AM, vino yang 
>> > > wrote:
>> > > > > >
>> > > > > > Hi Till,
>> > > > > >
>> > > > > > I think it makes sense to release a bug fix version (especially
>> > some
>> > > > > > serious bug fixes) for flink 1.5.
>> > > > > > Consider that some companies' production environments are more
>> > > cautious
>> > > > > > about upgrading large versions.
>> > > > > > I think some organizations are still using 1.5.x or even 1.4.x.
>> > > > > >
>> > > > > > Best,
>> > > > > > Vino
>> > > > > >
>> > > > > > Till Rohrmann  于2018年12月7日周五 下午11:39写道：
>> > > > > >
>> > > > > >> Dear community,
>> > > > > >>
>> > > > > >> I wanted to reach out to you and discuss whether we should
>> > release a
>> > > > last
>> > > > > >> bug fix release for the 1.5 branch.
>> > > > > >>
>> > > > > >> Since we have already released Flink 1.7.0, we only need to
>> > support
>> > > > the
>> > > > > >> 1.6.x and 1.7.x branches (last two major releases). However,
>> the
>> > > > current
>> > > > > >> release-1.5 branch contains 45 unreleased fixes. Some of the
>> fixes
>> > > > address
>> > > > > >> serializer duplication problems (FLINK-10839, FLINK-10693),
>> fixing
>> > > > > >> retractions (FLINK-10674) or prevent a deadlock in the
>> > > > > >> SpillableSubpartition (FLINK-10491). I think it would be nice
>> for
>> > > our
>> > > > users
>> > > > > >> if we officially terminated the Flink 1.5.x support with a last
>> > > 1.5.6
>> > > > > >> release. What do you think?
>> > > > > >>
>> > > > > >> Cheers,
>> > > > > >> Till
>> > > > > >>
>> > > > >
>> > > >
>> > >
>> > >
>> > > --
>> > > Best Regards
>> > >
>> > > Jeff Zhang
>> > >
>> >
>>
>

[jira] [Created] (FLINK-11159) Allow configuration whether to fall back to savepoints for restore

Nico Kruber created FLINK-11159:
---

 Summary: Allow configuration whether to fall back to savepoints 
for restore
 Key: FLINK-11159
 URL: https://issues.apache.org/jira/browse/FLINK-11159
 Project: Flink
  Issue Type: Improvement
  Components: State Backends, Checkpointing
Affects Versions: 1.7.0, 1.6.2, 1.5.5
Reporter: Nico Kruber


Ever since FLINK-3397, upon failure, Flink would restart from the latest 
checkpoint/savepoint which ever is more recent. With the introduction of local 
recovery and the knowledge that a RocksDB checkpoint restore would just copy 
the files, it may be time to re-consider / making this configurable:
In certain situations, it may be faster to restore from the latest checkpoint 
only (even if there is a more recent savepoint) and reprocess the data between. 
On the downside, though, that may not be correct because that might break side 
effects if the savepoint was the latest one, e.g. consider this chain: {{chk1 
-> chk2 -> sp … restore chk2 -> …}}. Then all side effects between {{chk2 -> 
sp}} would be reproduced.

Making this configurable will allow the user to set whatever he needs / can to 
get the lowest recovery time in Flink.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-11160) Confluent Avro Serialization Schema

2018-12-13 Thread Zhenhao Li (JIRA)

Zhenhao Li created FLINK-11160:
--

 Summary: Confluent Avro Serialization Schema 
 Key: FLINK-11160
 URL: https://issues.apache.org/jira/browse/FLINK-11160
 Project: Flink
  Issue Type: Improvement
  Components: Kafka Connector
Affects Versions: 1.7.0
Reporter: Zhenhao Li
 Fix For: 1.8.0, 1.7.2


Currently, Flink is missing Serialization Schema to work with the Confluent 
Avro format and the Confluent schema registry.

I wrote something that solved this problem for the company I currently work at. 
I think it is nice to contribute something back to the community. It has been 
used in a Scala project, and the project has been deployed to production. 

The new serialization schemas only serialize GenericRecord and users have to 
pass the Avro schema files to the constructors. It might be not flexible enough 
to cover a broader set of use cases. The keyed serialization schema works for 
only Scala key-value paris.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: [DISCUSS] Creating last bug fix release for 1.5 branch

FLINK-11023: will not be fixed for 1.5.6; this would take significantly
longer to implement, and TBH I'm not really keen on doing that for a
final bugfix release.
FLINK-7991: This is just a minor cleanup; the issue doesn't affect users
in any way. It is thus not particularly important to have for this
release and can be omitted IMO; I would also have to double-check
whether the open PR applies properly to 1.5.6, and frankly I don't have
the time for that right now anyway.
FLINK-10251: has been in review for a while, but will likely not be
merged this year from what I know.
FLINK-9253: appears to require additional changes and is also quite
outdated (it is from May after all), and looks more like a general
improvement than a bug fix from the JIRA description. I would omit this
from the release, unless Nico objects.

On 13.12.2018 17:08, Thomas Weise wrote:

Hi,

I would be interested to try my hand at being the release manger for this.

There are currently still 5 in-progress issues [1], all except [2] with an
open PR.

Nico, Chesnay, Till: Can you please take a look and see if these can be
completed?

Thanks,
Thomas

[1]
https://issues.apache.org/jira/issues/?jql=statusCategory%20%3D%20indeterminate%20AND%20project%20%3D%2012315522%20AND%20fixVersion%20%3D%2012344315%20ORDER%20BY%20priority%20DESC%2C%20key%20ASC
[2] https://issues.apache.org/jira/browse/FLINK-9010

On Mon, Dec 10, 2018 at 3:15 PM Thomas Weise wrote:

Thanks Till and my belated +1 for a final patch release :)

On Mon, Dec 10, 2018 at 5:47 AM Till Rohrmann
wrote:

Thanks for the feedback! I conclude that the community is in favour of a
last 1.5.6 release. I'll try to make the arrangements in the next two
weeks.

Cheers,
Till

On Mon, Dec 10, 2018 at 2:40 AM jincheng sun
wrote:

+1. There are incompatible improvements between 1.5.x and 1.6/1.7, so

many

1.5.x users may not be willing to upgrade to 1.6 or 1.7 due to migration
costs, so it makes sense to creating last bug fix release for 1.5

branch.

Bests,
Jincheng

Jeff Zhang 于2018年12月10日周一 上午9:24写道：

+1, I think very few people would use 1.6 or 1.7 in their production

near future, so I expect they would use 1.5 in production for a long
period，it makes sense to provide a stable version for production

usage.

Ufuk Celebi 于2018年12月9日周日 下午6:07写道：

+1. This seems reasonable to me. Since the fixes are already in and
also part of other releases, the release overhead should be
manageable.

@Vino: I agree with your assessment.

@Qi: As Till mentioned, the official project guideline is to support
the last two minor releases, e.g. currently 1.7 and 1.6.

Best,

Ufuk

On Sun, Dec 9, 2018 at 3:48 AM qi luo wrote:

Hi Till,

Does Flink has an agreement on how long will a major version be

supported? Some companies may need a long time to upgrade Flink

major

versions in production. If Flink terminates support for a major

version

too

quickly, it may be a concern for companies.

Best,
Qi

On Dec 8, 2018, at 10:57 AM, vino yang

wrote:

Hi Till,

I think it makes sense to release a bug fix version (especially

some

serious bug fixes) for flink 1.5.
Consider that some companies' production environments are more

cautious

about upgrading large versions.
I think some organizations are still using 1.5.x or even 1.4.x.

Best,
Vino

Till Rohrmann 于2018年12月7日周五 下午11:39写道：

Dear community,

I wanted to reach out to you and discuss whether we should

release a

last

bug fix release for the 1.5 branch.

Since we have already released Flink 1.7.0, we only need to

support

the

1.6.x and 1.7.x branches (last two major releases). However,

the

current

release-1.5 branch contains 45 unreleased fixes. Some of the

fixes

address

serializer duplication problems (FLINK-10839, FLINK-10693),

fixing

retractions (FLINK-10674) or prevent a deadlock in the
SpillableSubpartition (FLINK-10491). I think it would be nice

for

our

users

if we officially terminated the Flink 1.5.x support with a last

1.5.6

release. What do you think?

Cheers,
Till

--
Best Regards

Jeff Zhang

Create version 1.7.1 in JIRA

2018-12-13 Thread Thomas Weise

Hi,

Can a PMC member please create the version number 1.7.1. There are already
some JIRAs with 1.7.2 version that may have to updated as well.

https://issues.apache.org/jira/projects/FLINK?selectedItem=com.atlassian.jira.jira-projects-plugin:release-page&status=released-unreleased

Thanks

Re: Create version 1.7.1 in JIRA

The 1.7.1 version already exists. As I'Ve already started the release 
process for 1.1 I have archived this version temporarily to save me the 
hassle of updating JIRAs that people now mark as fixed for 1.7.1 even 
though they aren't included.


On 13.12.2018 21:17, Thomas Weise wrote:

Hi,

Can a PMC member please create the version number 1.7.1. There are already
some JIRAs with 1.7.2 version that may have to updated as well.

https://issues.apache.org/jira/projects/FLINK?selectedItem=com.atlassian.jira.jira-projects-plugin:release-page&status=released-unreleased

Thanks

Thanks for hiding ASF GitHub Bot logs on JIRA

2018-12-13 Thread Tzu-Li Chen

Not sure how we arrive here but comments a bit noisy made by ASF GitHub Bot
are now hidden to Work Log. Thank you!

Best,
tison.

Re: Thanks for hiding ASF GitHub Bot logs on JIRA