[jira] [Created] (FLINK-12093) Apache Flink:Active MQ consumer job is getting finished after first message consume.

2019-04-03 Thread shiv kumar (JIRA)
shiv kumar created FLINK-12093:
--

 Summary: Apache Flink:Active MQ consumer job is getting finished 
after first message consume.
 Key: FLINK-12093
 URL: https://issues.apache.org/jira/browse/FLINK-12093
 Project: Flink
  Issue Type: Bug
  Components: API / DataStream
Affects Versions: 1.7.2
 Environment: Working in my local IDE(Eclipse).
Reporter: shiv kumar


Hi Team,

 

Below is my the code the the execution environment to run the Apache Flink job 
that's consume message from ActiveMQ topic::

 

StreamExecutionEnvironment env = createExecutionEnvironment();

connectionFactory = new ActiveMQConnectionFactory("**", "**.",
 "failover:(tcp://amq-master-01:61668)?timeout=3000");

LOG.info("exceptionListener{}", new AMQExceptionListLocal(LOG, true));

RunningChecker runningChecker = new RunningChecker();
 runningChecker.setIsRunning(true);

AMQSourceConfig config = new 
AMQSourceConfig.AMQSourceConfigBuilder()
 .setConnectionFactory(connectionFactory).setDestinationName("test_flink")
 
.setDeserializationSchema(deserializationSchema).setRunningChecker(runningChecker)
 .setDestinationType(DestinationType.TOPIC).build();

amqSource = new AMQSourceLocal<>(config);

LOG.info("Check whether ctx is null ::{}", amqSource);

DataStream dataMessage = env.addSource(amqSource);

dataMessage.writeAsText("C:/Users/shivkumar/Desktop/flinksjar/output.txt", 
WriteMode.OVERWRITE);
 System.out.println("Step 1");

env.execute("Check ACTIVE_MQ");

 

When we are starting the job, Topic is getting created and message is getting 
dequeued from that topic.

But After that is getting finished. What Can be done to keep the job running?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-12094) Introduce sort merge join operator to blink batch

2019-04-03 Thread Jingsong Lee (JIRA)
Jingsong Lee created FLINK-12094:


 Summary: Introduce sort merge join operator to blink batch
 Key: FLINK-12094
 URL: https://issues.apache.org/jira/browse/FLINK-12094
 Project: Flink
  Issue Type: New Feature
  Components: Table SQL / Runtime
Reporter: Jingsong Lee
Assignee: Jingsong Lee


Introduce SortMergeJoinOperator: An implementation that realizes the joining 
through a sort-merge join strategy.

Support all sql types: INNER, LEFT, RIGHT, FULL, SEMI, ANTI

1.In most cases, its performance is weaker than HashJoin.
2.It is more stable than HashJoin, and most of the data can be sorted stably.
3.SortMergeJoin should be the best choice if sort can be omitted in the case of 
multi-level join cascade with the same key.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [Proposal] Shuffle resources lifecycle management

2019-04-03 Thread zhijiang
Hi Andrey,

Thanks for concluding this valuable proposal. It really extends and enhances 
the details of partition lifecycle management ever mentioned in FLIP-31. I 
reviewed the google doc and agreed the general points in it.

One implict point in doc is that the shuffle is not limted to only serve for 
downstream and upstream in one job, even the data shuffle is required across 
jobs like interactive feature, so the ShuffleMaster is not coupled with 
JobMaster.
These thoughts are also consitent with our previous confirmation and we could 
fight in this direction in long term. We would further focus on them in 
specific PRs considering the feasibility in next 1.9 release.

Best,
Zhijiang
--
From:Andrey Zagrebin 
Send Time:2019年3月29日(星期五) 18:38
To:dev 
Subject:[Proposal] Shuffle resources lifecycle management

Hi All,

While working on pluggable shuffle architecture, looking into interactive
programming and fine-grained recovery efforts, we released that lifecycle
management of intermediate result partitions needs more detailed
discussion to enable envisioned use cases.

Here I prepared a design document to address this concern. The document
proposes certain extensions to FLIP-31 (Design of Pluggable Shuffle
Service):

https://docs.google.com/document/d/13vAJJxfRXAwI4MtO8dux8hHnNMw2Biu5XRrb_hvGehA

Looking forward to your feedback.

Thanks,
Andrey



[jira] [Created] (FLINK-12095) Canceling vs cancelling

2019-04-03 Thread Mike Pedersen (JIRA)
Mike Pedersen created FLINK-12095:
-

 Summary: Canceling vs cancelling
 Key: FLINK-12095
 URL: https://issues.apache.org/jira/browse/FLINK-12095
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Reporter: Mike Pedersen


British and American English is mixed in some places, leading to confusion 
between "canceled" and "cancelled", and "canceling" and "cancelling".

For example here, it is incorrectly called "cancelling" here: 
https://ci.apache.org/projects/flink/flink-docs-stable/internals/job_scheduling.html

Likewise in the state diagram of the ExecutionState itself: 
https://github.com/apache/flink/blob/8ed85fe49b7595546a8f968e0faa1fa7d4da47ec/flink-runtime/src/main/java/org/apache/flink/runtime/execution/ExecutionState.java#L31

Same for the /jobs REST schema. It is unclear if this is due to incorrect 
documentation or a misspelling in the JSON output i.e. if the schema is wrong 
or correct, but misspelled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Apply for JIRA permission

2019-04-03 Thread Chesnay Schepler

You now have contributor permissions.

On 03/04/2019 08:18, Hefei Li wrote:

Hi guys,

I want to contribute to Apache Flink.

Would you please give me the permission as a contributor ?

My JIRA username is *lhfei *.




===
Best Regards
Hefei LiHefei Li
MP: +86  18701581473
MSN: lh...@live.cn
===





[jira] [Created] (FLINK-12096) Flink counter state getting stuck on usin

2019-04-03 Thread Yogesh Nandigamq (JIRA)
Yogesh Nandigamq created FLINK-12096:


 Summary: Flink counter state getting stuck on usin
 Key: FLINK-12096
 URL: https://issues.apache.org/jira/browse/FLINK-12096
 Project: Flink
  Issue Type: Bug
  Components: Runtime / State Backends
Reporter: Yogesh Nandigamq


Working on a job to dynamically create counters for a keyed stream using 
counter value state and FileSystem State Backend. It was observed that the 
counters are stuck after a few records which is most probably because of the 
counter value state not getting updated. 

FS State backend was used because the counter value state was not getting 
updated from the first record itself when using RocksDBS State Backend.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-12097) Flink SQL hangs in StreamTableEnvironment.sqlUpdate, keeps executing and seems never stop, finally lead to java.lang.OutOfMemoryError: GC overhead limit exceeded

2019-04-03 Thread xu (JIRA)
xu created FLINK-12097:
--

 Summary: Flink SQL hangs in StreamTableEnvironment.sqlUpdate, 
keeps executing and seems never stop, finally lead to 
java.lang.OutOfMemoryError: GC overhead limit exceeded
 Key: FLINK-12097
 URL: https://issues.apache.org/jira/browse/FLINK-12097
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Affects Versions: 1.7.2
Reporter: xu
 Attachments: DSL.txt

Hi Experts,
There is a Flink application(Version 1.7.2) which is written in Flink SQL, and 
the SQL in the application is quite long, consists of about 10 tables, 1500 
lines in total. When executing, I found it is hanged in 
StreamTableEnvironment.sqlUpdate, keep executing some code about calcite and 
the memory usage keeps grown up, after several minutes 
java.lang.OutOfMemoryError: GC overhead limit exceeded is got.
 
I get some thread dumps:
        at 
org.apache.calcite.plan.volcano.RuleQueue.popMatch(RuleQueue.java:475)
        at 
org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:640)
        at 
org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:339)
        at 
org.apache.flink.table.api.TableEnvironment.runVolcanoPlanner(TableEnvironment.scala:373)
        at 
org.apache.flink.table.api.TableEnvironment.optimizeLogicalPlan(TableEnvironment.scala:292)
        at 
org.apache.flink.table.api.StreamTableEnvironment.optimize(StreamTableEnvironment.scala:812)
        at 
org.apache.flink.table.api.StreamTableEnvironment.translate(StreamTableEnvironment.scala:860)
        at 
org.apache.flink.table.api.StreamTableEnvironment.writeToSink(StreamTableEnvironment.scala:344)
        at 
org.apache.flink.table.api.TableEnvironment.insertInto(TableEnvironment.scala:879)
        at 
org.apache.flink.table.api.TableEnvironment.sqlUpdate(TableEnvironment.scala:817)
        at 
org.apache.flink.table.api.TableEnvironment.sqlUpdate(TableEnvironment.scala:777)
 
 
        at java.io.PrintWriter.write(PrintWriter.java:473)
        at 
org.apache.calcite.rel.AbstractRelNode$1.explain_(AbstractRelNode.java:415)
        at 
org.apache.calcite.rel.externalize.RelWriterImpl.done(RelWriterImpl.java:156)
        at 
org.apache.calcite.rel.AbstractRelNode.explain(AbstractRelNode.java:312)
        at 
org.apache.calcite.rel.AbstractRelNode.computeDigest(AbstractRelNode.java:420)
        at 
org.apache.calcite.rel.AbstractRelNode.recomputeDigest(AbstractRelNode.java:356)
        at 
org.apache.calcite.rel.AbstractRelNode.onRegister(AbstractRelNode.java:350)
        at 
org.apache.calcite.plan.volcano.VolcanoPlanner.registerImpl(VolcanoPlanner.java:1484)
        at 
org.apache.calcite.plan.volcano.VolcanoPlanner.register(VolcanoPlanner.java:859)
        at 
org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:879)
        at 
org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:1755)
        at 
org.apache.calcite.plan.volcano.VolcanoRuleCall.transformTo(VolcanoRuleCall.java:135)
        at 
org.apache.calcite.plan.RelOptRuleCall.transformTo(RelOptRuleCall.java:234)
        at 
org.apache.calcite.rel.convert.ConverterRule.onMatch(ConverterRule.java:141)
        at 
org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:212)
        at 
org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:646)
        at 
org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:339)
        at 
org.apache.flink.table.api.TableEnvironment.runVolcanoPlanner(TableEnvironment.scala:373)
        at 
org.apache.flink.table.api.TableEnvironment.optimizeLogicalPlan(TableEnvironment.scala:292)
        at 
org.apache.flink.table.api.StreamTableEnvironment.optimize(StreamTableEnvironment.scala:812)
        at 
org.apache.flink.table.api.StreamTableEnvironment.translate(StreamTableEnvironment.scala:860)
        at 
org.apache.flink.table.api.StreamTableEnvironment.writeToSink(StreamTableEnvironment.scala:344)
        at 
org.apache.flink.table.api.TableEnvironment.insertInto(TableEnvironment.scala:879)
        at 
org.apache.flink.table.api.TableEnvironment.sqlUpdate(TableEnvironment.scala:817)
        at 
org.apache.flink.table.api.TableEnvironment.sqlUpdate(TableEnvironment.scala:777)
 *Both point to some code about calcite.*
 
And also I get the heap dump, found that there are *5703373 RexCall instances, 
and 5909525 String instances, 5909692 char[] instances ,**size is 6.8G*. I 
wonder why there are so many RexCall instances here, why it keeps on executing 
some calcite code and seems never stop. 
|char[]| |5,909,692 (16.4%)|6,873,430,938 (84.3%)|
|java.lang.String| |5,909,525 (16.4%)|165,466,700 (2%)|

|org.apache.calcite.rex.RexLocalRef| |5,901,313 (16.4%)|259,657,772 (3.2%)|
|org.apache.flink.calcite.shaded.com.google.common.collect.RegularImmutableList|
 |5,739,479 (15.9%)

[DISCUSS] Drop Elasticssearch 1 connector

2019-04-03 Thread Chesnay Schepler

Hello everyone,

I'm proposing to remove the connector for elasticsearch 1.

The connector is used significantly less than more recent versions (2&5 
are downloaded 4-5x more), and hasn't seen any development for over a 
hear, yet still incurred maintenance overhead due to licensing and testing.





[jira] [Created] (FLINK-12098) Add support for generating optimized logical plan for simple group aggregate on batch

2019-04-03 Thread godfrey he (JIRA)
godfrey he created FLINK-12098:
--

 Summary: Add support for generating optimized logical plan for 
simple group aggregate on batch
 Key: FLINK-12098
 URL: https://issues.apache.org/jira/browse/FLINK-12098
 Project: Flink
  Issue Type: New Feature
  Components: Table SQL / Planner
Reporter: godfrey he
Assignee: godfrey he


This issue only involves simple stream group aggregate, complex group 
aggregate(including GROUP SETS and DISTINCT) will also be finished in new issue.

A logical aggregate will be converted to StreamExecGroupAggregate, and a 
StreamExecGroupAggregate will be written to two-stage aggregates 
(StreamExecLocalGroupAggregate and StreamExecGlobalGroupAggregate) if 
mini-batch is enabled and all aggregate functions are mergeable.

Retraction rules will also be ported, because multiple level aggregates will 
produce retraction message.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-12099) Elasticsearch(1)SinkITCase fails on Java 9

2019-04-03 Thread Chesnay Schepler (JIRA)
Chesnay Schepler created FLINK-12099:


 Summary: Elasticsearch(1)SinkITCase fails on Java 9
 Key: FLINK-12099
 URL: https://issues.apache.org/jira/browse/FLINK-12099
 Project: Flink
  Issue Type: Sub-task
  Components: Connectors / ElasticSearch
Affects Versions: 1.9.0
Reporter: Chesnay Schepler
 Fix For: 1.9.0


The ES1 code fails while gathering information about the JVM. Since the code in 
question is executed in a static initializer block I don't a way to prevent 
this somehow from our side.

{code}
java.lang.ExceptionInInitializerError
at 
org.elasticsearch.node.internal.InternalNode.(InternalNode.java:143)
at org.elasticsearch.node.NodeBuilder.build(NodeBuilder.java:159)
at org.elasticsearch.node.NodeBuilder.node(NodeBuilder.java:166)
at 
org.apache.flink.streaming.connectors.elasticsearch.EmbeddedElasticsearchNodeEnvironmentImpl.start(EmbeddedElasticsearchNodeEnvironmentImpl.java:46)
at 
org.apache.flink.streaming.connectors.elasticsearch.ElasticsearchSinkTestBase.prepare(ElasticsearchSinkTestBase.java:72)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:564)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
at 
com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
at 
com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
at 
com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
at 
com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
Caused by: java.lang.UnsupportedOperationException: Boot class path mechanism 
is not supported
at 
java.management/sun.management.RuntimeImpl.getBootClassPath(RuntimeImpl.java:99)
at org.elasticsearch.monitor.jvm.JvmInfo.(JvmInfo.java:77)
... 24 more

{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-12100) Kafka 0.10 tests fail on Java 9

2019-04-03 Thread Chesnay Schepler (JIRA)
Chesnay Schepler created FLINK-12100:


 Summary: Kafka 0.10 tests fail on Java 9
 Key: FLINK-12100
 URL: https://issues.apache.org/jira/browse/FLINK-12100
 Project: Flink
  Issue Type: Sub-task
  Components: Connectors / Kafka
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.9.0


{code}
java.lang.NoClassDefFoundError: javax/xml/bind/DatatypeConverter
at 
kafka.utils.CoreUtils$.urlSafeBase64EncodeNoPadding(CoreUtils.scala:294)
at kafka.utils.CoreUtils$.generateUuidAsBase64(CoreUtils.scala:282)
at 
kafka.server.KafkaServer$$anonfun$getOrGenerateClusterId$1.apply(KafkaServer.scala:335)
at 
kafka.server.KafkaServer$$anonfun$getOrGenerateClusterId$1.apply(KafkaServer.scala:335)
at scala.Option.getOrElse(Option.scala:121)
at 
kafka.server.KafkaServer.getOrGenerateClusterId(KafkaServer.scala:335)
at kafka.server.KafkaServer.startup(KafkaServer.scala:190)
at 
org.apache.flink.streaming.connectors.kafka.KafkaTestEnvironmentImpl.getKafkaServer(KafkaTestEnvironmentImpl.java:430)
at 
org.apache.flink.streaming.connectors.kafka.KafkaTestEnvironmentImpl.prepare(KafkaTestEnvironmentImpl.java:256)
at 
org.apache.flink.streaming.connectors.kafka.KafkaTestBase.startClusters(KafkaTestBase.java:137)
at 
org.apache.flink.streaming.connectors.kafka.KafkaTestBase.prepare(KafkaTestBase.java:100)
at 
org.apache.flink.streaming.connectors.kafka.KafkaTestBase.prepare(KafkaTestBase.java:92)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:564)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
at 
org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
Caused by: java.lang.ClassNotFoundException: javax.xml.bind.DatatypeConverter
at 
java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:582)
at 
java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:185)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:496)
... 33 more
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-12101) Race condition when concurrenty running uploaded jars via REST

2019-04-03 Thread Maximilian Michels (JIRA)
Maximilian Michels created FLINK-12101:
--

 Summary: Race condition when concurrenty running uploaded jars via 
REST
 Key: FLINK-12101
 URL: https://issues.apache.org/jira/browse/FLINK-12101
 Project: Flink
  Issue Type: Bug
  Components: Runtime / REST
Affects Versions: 1.7.2, 1.6.4
Reporter: Maximilian Michels


Flink enables to upload and run Jars via REST. When multiple uploaded jars are 
invoked interactively to generate the JobGraph, the static initialization of 
the {{ContextEnvironment}}, when calls are interleaved, will override each 
other and produce a local execution of the jar. The local execution uses an 
incorrect class loader and throws an exception like this:

{noformat}
2019-04-02 14:25:05,549 ERROR- Failed to create job graph
java.lang.RuntimeException: Pipeline execution failed
at org.apache.beam.runners.flink.FlinkRunner.run(FlinkRunner.java:117)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:313)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:299)
at 
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:529)
at 
org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:421)
at 
org.apache.flink.client.program.OptimizerPlanEnvironment.getOptimizedPlan(OptimizerPlanEnvironment.java:83)
at 
org.apache.flink.client.program.PackagedProgramUtils.createJobGraph(PackagedProgramUtils.java:78)
at 
org.apache.flink.client.program.PackagedProgramUtils.createJobGraph(PackagedProgramUtils.java:120)
at 
org.apache.flink.runtime.webmonitor.handlers.utils.JarHandlerUtils$JarHandlerContext.toJobGraph(JarHandlerUtils.java:117)
at 
org.apache.flink.runtime.webmonitor.handlers.JarRunHandler.lambda$getJobGraphAsync$7(JarRunHandler.java:151)
at 
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.flink.runtime.client.JobExecutionException: Job execution 
failed.
at 
org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:146)
at 
org.apache.flink.runtime.minicluster.MiniCluster.executeJobBlocking(MiniCluster.java:647)
at 
org.apache.flink.streaming.api.environment.LocalStreamEnvironment.execute(LocalStreamEnvironment.java:123)
at 
org.apache.beam.runners.flink.FlinkPipelineExecutionEnvironment.executePipeline(FlinkPipelineExecutionEnvironment.java:125)
at org.apache.beam.runners.flink.FlinkRunner.run(FlinkRunner.java:114)
... 18 more
Caused by: org.apache.flink.streaming.runtime.tasks.StreamTaskException: Could 
not instantiate outputs in order.
at 
org.apache.flink.streaming.api.graph.StreamConfig.getOutEdgesInOrder(StreamConfig.java:398)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.createStreamRecordWriters(StreamTask.java:1164)
at org.apache.flink.streaming.runtime.tasks.StreamTask.(StreamTask.java:212)
at org.apache.flink.streaming.runtime.tasks.StreamTask.(StreamTask.java:190)
at 
org.apache.flink.streaming.runtime.tasks.SourceStreamTask.(SourceStreamTask.java:51)
at 
org.apache.flink.streaming.runtime.tasks.StoppableSourceStreamTask.(StoppableSourceStreamTask.java:39)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at 
org.apache.flink.runtime.taskmanager.Task.loadAndInstantiateInvokable(Task.java:1398)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:682)
... 1 more
Caused by: java.lang.ClassNotFoundException: 
org.apache.beam.runners.flink.translation.wrappers.streaming.WorkItemKeySelector
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at 
org.apache.flink.runtime.execution.librarycache.FlinkUserCodeClassLoaders$ChildFirstClassLoader.loadClass(FlinkUserCodeClassLoaders.java:129)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at 
org.apache.flink.util.InstantiationUtil$Cl

[jira] [Created] (FLINK-12102) FlinkILoopTest fails on Java 9

2019-04-03 Thread Chesnay Schepler (JIRA)
Chesnay Schepler created FLINK-12102:


 Summary: FlinkILoopTest fails on Java 9
 Key: FLINK-12102
 URL: https://issues.apache.org/jira/browse/FLINK-12102
 Project: Flink
  Issue Type: Sub-task
  Components: Scala Shell
Reporter: Chesnay Schepler
 Fix For: 1.9.0


{code}
java.lang.NullPointerException
at 
org.apache.flink.api.java.FlinkILoopTest.testConfigurationForwarding(FlinkILoopTest.java:89)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-12103) Extend builtin operator metrics with user defined scope and variables

2019-04-03 Thread Ben Marini (JIRA)
Ben Marini created FLINK-12103:
--

 Summary: Extend builtin operator metrics with user defined scope 
and variables
 Key: FLINK-12103
 URL: https://issues.apache.org/jira/browse/FLINK-12103
 Project: Flink
  Issue Type: Wish
  Components: Runtime / Metrics
Affects Versions: 1.7.2
Reporter: Ben Marini


I'm looking for a way to extend the builtin throughput and latency metrics for 
operators with my own metric variables.

My specific use case:

I have a job that defines a list of independent source -> sink streams. I would 
like to add my own metric variables to each of these independent streams. For 
example, something like this:

{code:scala}
class MyFilter extends RichFilterFunction {
  override def open(parameters: Configuration): Unit = {
    val mg = getRuntimeContext.getMetricGroup // Includes "streamName" -> "A|B"

    // Init some user defined metrics here...
  }
}

// Stream A
// Native operator metrics and user defined metrics in rich operators include 
"streamName" -> "A"
streamA = env.withMetricGroup((mg) => mg.addGroup("streamName", 
"A").addSource(...).filter(new MyFilter).addSink(...)

// Stream B
// Native operator metrics and user defined metrics in rich operators include 
"streamName" -> "B"

streamB = env.withMetricGroup((mg) => mg.addGroup("streamName", 
"B").addSource(...).filter(new MyFilter).addSink(...)
{code}
 
Is this possible? Would a new hook into StreamTransformation have to be added?




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-12104) Flink Kafka fails with Incompatible KafkaProducer version / NoSuchFieldException sequenceNumbers

2019-04-03 Thread Tim (JIRA)
Tim created FLINK-12104:
---

 Summary: Flink Kafka fails with Incompatible KafkaProducer version 
/ NoSuchFieldException sequenceNumbers
 Key: FLINK-12104
 URL: https://issues.apache.org/jira/browse/FLINK-12104
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Kafka
Affects Versions: 1.7.2
Reporter: Tim


FlinkKafkaProducer (in flink-connector-kafka-0.11) tries to access a field 
named `sequenceNumbers` from the KafkaProducer's TransactionManager.  You can 
find this line on the [master branch 
here|[https://github.com/apache/flink/blob/d6be68670e661091d94a3c65a2704d52fc0e827c/flink-connectors/flink-connector-kafka-0.11/src/main/java/org/apache/flink/streaming/connectors/kafka/internal/FlinkKafkaProducer.java#L197].]

 
{code:java}
Object transactionManager = getValue(kafkaProducer, "transactionManager");
...
Object sequenceNumbers = getValue(transactionManager, "sequenceNumbers");
{code}
 

However, the Kafka TransactionManager no longer has a "sequenceNumbers" field.  
This was changed back on 9/14/2017 (KAFKA-5494) in an effort to support 
multiple inflight requests while still guaranteeing idempotence.  See [commit 
diff 
here|[https://github.com/apache/kafka/commit/5d2422258cb975a137a42a4e08f03573c49a387e#diff-f4ef1afd8792cd2a2e9069cd7ddea630].]

Subsequently when Flink tries to "recoverAndCommit" (see FlinkKafkaProducer011) 
it fails with a "NoSuchFieldException: sequenceNumbers", followed by a 
"Incompatible KafkaProducer version".

Given that the KafkaProducer used is so old (this change was made almost two 
years ago) are there any plans of upgrading?   Or - are there some known 
compatibility issues that prevent Flink/Kafka connector from doing so?

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-12105) TUMBLE INTERVAL value errors out for 100 or more value

2019-04-03 Thread Vinod Mehra (JIRA)
Vinod Mehra created FLINK-12105:
---

 Summary: TUMBLE INTERVAL value errors out for 100 or more value
 Key: FLINK-12105
 URL: https://issues.apache.org/jira/browse/FLINK-12105
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Client
Affects Versions: 1.7.2
 Environment: [https://github.com/ververica/sql-training]
Reporter: Vinod Mehra


I ran into this while experimenting with different values at Lyft eng. However 
it is reproducible with [https://github.com/ververica/sql-training] as well. I 
showed this issue to the training instructors during flink-forward-19 and they 
asked me to file this bug.

The INTERVAL values work fine until 99. Errors after that:

*TUMBLE(rideTime, INTERVAL '100' SECOND)*

_org.apache.calcite.sql.validate.SqlValidatorException: Interval field value 
100 exceeds precision of SECOND(2) field_

*TUMBLE(rideTime, INTERVAL '100' MINUTE)*

_org.apache.calcite.sql.validate.SqlValidatorException: Interval field value 
100 exceeds precision of MINUTE(2) field_

*TUMBLE(rideTime, INTERVAL '100' HOUR)*

_org.apache.calcite.sql.validate.SqlValidatorException: Interval field value 
100 exceeds precision of HOUR(2) field_

*TUMBLE(rideTime, INTERVAL '100' DAY)*

Interval field value 100 exceeds precision of DAY(2) field

(Note: MONTH AND YEAR also error out but for different reasons ("_Only constant 
window intervals with millisecond resolution are supported_"). MONTH and YEAR 
intervals are not supported at all currently. I was told that it is hard to 
implement because of timezone differences. I will file that separately.)_
_



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUSS] FLIP-38 Support python language in flink TableAPI

2019-04-03 Thread Shuyi Chen
Thanks a lot for driving the FLIP, jincheng. The approach looks
good. Adding multi-lang support sounds a promising direction to expand the
footprint of Flink. Do we have plan for adding Golang support? As many
backend engineers nowadays are familiar with Go, but probably not Java as
much, adding Golang support would significantly reduce their friction to
use Flink. Also, do we have a design for multi-lang UDF support, and what's
timeline for adding DataStream API support? We would like to help and
contribute as well as we do have similar need internally at our company.
Thanks a lot.

Shuyi

On Tue, Apr 2, 2019 at 1:03 AM jincheng sun 
wrote:

> Hi All,
> As Xianda brought up in the previous email, There are a large number of
> data analysis users who want flink to support Python. At the Flink API
> level, we have DataStreamAPI/DataSetAPI/TableAPI&SQL, the Table API will
> become the first-class citizen. Table API is declarative and can be
> automatically optimized, which is mentioned in the Flink mid-term roadmap
> by Stephan. So we first considering supporting Python at the Table level to
> cater to the current large number of analytics users. For further promote
> Python support in flink table level. Dian, Wei and I discussed offline a
> bit and came up with an initial features outline as follows:
>
> - Python TableAPI Interface
>   Introduce a set of Python Table API interfaces, including interface
> definitions such as Table, TableEnvironment, TableConfig, etc.
>
> - Implementation Architecture
>   We will offer two alternative architecture options, one for pure Python
> language support and one for extended multi-language design.
>
> - Job Submission
>   Provide a way that can submit(local/remote) Python Table API jobs.
>
> - Python Shell
>   Python Shell is to provide an interactive way for users to write and
> execute flink Python Table API jobs.
>
>
> The design document for FLIP-38 can be found here:
>
>
> https://docs.google.com/document/d/1ybYt-0xWRMa1Yf5VsuqGRtOfJBz4p74ZmDxZYg3j_h8/edit?usp=sharing
>
> I am looking forward to your comments and feedback.
>
> Best,
> Jincheng
>


[CANCEL][VOTE] Release 1.8.0, release candidate #4

2019-04-03 Thread Aljoscha Krettek
I’m hereby canceling the vote for Flink 1.8.0 RC4 in favour of a new RC that I 
will create shortly now that the blockers are resolved.

> On 1. Apr 2019, at 13:21, Till Rohrmann  wrote:
> 
> Thanks for reporting this problem and opening a JIRA issue. I've created a
> fix for the problem [1].
> 
> [1] https://github.com/apache/flink/pull/8096
> 
> Cheers,
> Till
> 
> On Mon, Apr 1, 2019 at 12:30 AM Richard Deurwaarder  wrote:
> 
>> Hello @Aljoscha and @Rong,
>> 
>> I've described the problem in the mailing list[1] and on stackoverflow[2]
>> before. But the gist is: If there's a firewall between the yarn cluster and
>> the machine submitting the job, we need to be able to set a fixed port (or
>> range of ports) for REST communication with the jobmanager.
>> 
>> It is a regression in the sense that on 1.5 (and 1.6 I believe?) it was
>> possible to work around this by using the legacy mode (non flip-6), but on
>> 1.7 and now 1.8 this is not possible.
>> 
>> I've created FLINK-12075 <
>> https://issues.apache.org/jira/browse/FLINK-12075>
>> for it, I have not made it blocking yet as it is not strictly a regression
>> with regards to 1.7. Perhaps you guys can better determine if you want this
>> added in RC5.
>> 
>> Regards,
>> 
>> Richard
>> 
>> [1]
>> 
>> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Submitting-job-to-Flink-on-yarn-timesout-on-flip-6-1-5-x-td26199.html#a26383
>> [2] https://stackoverflow.com/q/54771637/988324
>> 
>> On Sat, Mar 30, 2019 at 7:24 PM Rong Rong  wrote:
>> 
>>> Hi @Aljoscha,
>>> 
>>> Based on the previous commit [1] that adds the random port selection
>> code,
>>> it seems like the important part is to unset whatever 'rest.port' setting
>>> previously done. I don't think the current way of setting the BIND_PORT
>>> actually overrides any existing PORT setting. However, I wasn't able to
>>> find any test that is related, maybe @Till can provide more insight here?
>>> 
>>> Maybe @Richard can provide more detail on the YARN run command used to
>>> reproduce the problem?
>>> 
>>> Thanks,
>>> Rong
>>> 
>>> [1]
>>> 
>>> 
>> https://github.com/apache/flink/commit/dbe0e8286d76a5facdb49589b638b87dbde80178#diff-487838863ab693af7008f04cb3359be3R117
>>> 
>>> On Sat, Mar 30, 2019 at 5:51 AM Aljoscha Krettek 
>>> wrote:
>>> 
 @Richard Did this work for you previously? From the change, it seems
>> that
 the port was always set to 0 on YARN even before.
 
> On 28. Mar 2019, at 16:13, Richard Deurwaarder 
>>> wrote:
> 
> -1 (non-binding)
> 
> - Ran integration tests locally (1000+) of our flink job, all
>>> succeeded.
> - Attempted to run job on hadoop, failed. It failed because we have a
> firewall in place and we cannot set the rest port to a specific
>>> port/port
> range.
> Unless I am mistaken, it seems like FLINK-11081 broke the possibility
>>> of
> setting a REST port when running on yarn (
> 
 
>>> 
>> https://github.com/apache/flink/commit/730eed71ef3f718d61f85d5e94b1060844ca56db#diff-487838863ab693af7008f04cb3359be3R102
> )
> Code-wise it seems rather straightforward to fix but I am unsure
>> about
 the
> reason why this is hard-coded to 0 and what the impact would be.
> 
> It would benefit us greatly if a fix for this could make it to 1.8.0.
> 
> Regards,
> 
> Richard
> 
> On Thu, Mar 28, 2019 at 9:54 AM Tzu-Li (Gordon) Tai <
>>> tzuli...@apache.org
> 
> wrote:
> 
>> +1 (binding)
>> 
>> Functional checks:
>> 
>> - Built Flink from source (`mvn clean verify`) locally, with success
>> - Ran end-to-end tests locally for 5 times in a loop, no attempts
>>> failed
>> (Hadoop 2.8.4, Scala 2.12)
>> - Manually tested state schema evolution for POJO. Besides the tests
 that
>> @Congxian already did, additionally tested evolution cases with POJO
>> subclasses + non-registered POJOs.
>> - Manually tested migration of Scala stateful jobs that use case
 classes /
>> Scala collections as state types, performing the migration across
>>> Scala
>> 2.11 to Scala 2.12.
>> - Reviewed release announcement PR
>> 
>> Misc / legal checks:
>> 
>> - checked checksums and signatures
>> - No binaries in source distribution
>> - Staging area does not seem to have any missing artifacts
>> 
>> Cheers,
>> Gordon
>> 
>> On Thu, Mar 28, 2019 at 4:52 PM Tzu-Li (Gordon) Tai <
 tzuli...@apache.org>
>> wrote:
>> 
>>> @Shaoxuan
>>> 
>>> The drop in the serializerAvro benchmark, as explained earlier in
>> previous
>>> voting threads of earlier RCs, was due to a slower job
>> initialization
>> phase
>>> caused by slower deserialization of the AvroSerializer.
>>> Piotr also pointed out that after the number of records was
>> increased
 in
>>> the serializer benchmarks, this drop was no longer observable
>> before
>>> /
>>> after

[jira] [Created] (FLINK-12106) Jobmanager is killing FINISHED taskmanger containers, causing exception in still running Taskmanagers an

2019-04-03 Thread John (JIRA)
John created FLINK-12106:


 Summary: Jobmanager is killing FINISHED taskmanger containers, 
causing exception in still running Taskmanagers an
 Key: FLINK-12106
 URL: https://issues.apache.org/jira/browse/FLINK-12106
 Project: Flink
  Issue Type: Bug
  Components: Deployment / YARN
Affects Versions: 1.7.2
 Environment: Hadoop:  hdp/2.5.6.0-40

Flink: 2.7.2
Reporter: John


When running a single flink job on YARN, some of the taskmanger containers 
reach the FINISHED state before others.  It appears that, after receiving final 
execution state FINISHED from a taskmanager, jobmanager is waiting ~68 seconds 
and then freeing the associated slot in the taskmanager.  After and additional 
60 seconds, jobmanager is stopping the same taskmanger because TaskExecutor 
exceeded the idle timeout.

Meanwhile, other taskmangers are still working to complete the job.  Within 10 
seconds after the taskmanger container above is stopped, the remaining task 
managers receive an exception due to loss of connection to the stopped 
taskmanager.  These exceptions result job failure.

 

Relevant logs:

2019-04-03 13:49:00,013 INFO  org.apache.flink.yarn.YarnResourceManager         
            - Registering TaskManager with ResourceID 
container_1553017480503_0158_01_38 
(akka.tcp://flink@hadoop4:42745/user/taskmanager_0) at ResourceManager

2019-04-03 13:49:05,900 INFO  org.apache.flink.yarn.YarnResourceManager         
            - Registering TaskManager with ResourceID 
container_1553017480503_0158_01_59 
(akka.tcp://flink@hadoop9:55042/user/taskmanager_0) at ResourceManager

 

 

2019-04-03 13:48:51,132 INFO  org.apache.flink.yarn.YarnResourceManager         
            - Received new container: container_1553017480503_0158_01_77 - 
Remaining pending container requests: 6

2019-04-03 13:48:52,862 INFO  org.apache.flink.yarn.YarnTaskExecutorRunner      
            -     
-Dlog.file=/hadoop/yarn/log/application_1553017480503_0158/container_1553017480503_0158_01_77/taskmanager.log

2019-04-03 13:48:57,490 INFO  
org.apache.flink.runtime.io.network.netty.NettyServer         - Successful 
initialization (took 202 ms). Listening on SocketAddress /192.168.230.69:40140.

2019-04-03 13:49:12,575 INFO  org.apache.flink.yarn.YarnResourceManager         
            - Registering TaskManager with ResourceID 
container_1553017480503_0158_01_77 
(akka.tcp://flink@hadoop9:51525/user/taskmanager_0) at ResourceManager

2019-04-03 13:49:12,631 INFO  
org.apache.flink.runtime.taskexecutor.TaskExecutor            - Allocated slot 
for AllocationID\{42fed3e5a136240c23cc7b394e3249e9}.

2019-04-03 14:58:15,188 INFO  
org.apache.flink.runtime.taskexecutor.TaskExecutor            - Un-registering 
task and sending final execution state FINISHED to JobManager for task DataSink 
(com.anovadata.alexflinklib.sinks.bucketing.BucketingOutputFormat@26874f2c) 
a4b5fb32830d4561147b2714828109e2.

2019-04-03 14:59:23,049 INFO  
org.apache.flink.runtime.jobmaster.slotpool.SlotPool          - Releasing idle 
slot [AllocationID\{42fed3e5a136240c23cc7b394e3249e9}].

2019-04-03 14:59:23,058 INFO  
org.apache.flink.runtime.taskexecutor.slot.TaskSlotTable      - Free slot 
TaskSlot(index:0, state:ACTIVE, resource profile: 
ResourceProfile\{cpuCores=1.7976931348623157E308, heapMemoryInMB=2147483647, 
directMemoryInMB=2147483647, nativeMemoryInMB=2147483647, 
networkMemoryInMB=2147483647}, allocationId: 
AllocationID\{42fed3e5a136240c23cc7b394e3249e9}, jobId: 
a6c4e367698c15cdf168d19a89faff1d).

2019-04-03 15:00:02,641 INFO  org.apache.flink.yarn.YarnResourceManager         
            - Stopping container container_1553017480503_0158_01_77.

2019-04-03 15:00:02,646 INFO  org.apache.flink.yarn.YarnResourceManager         
            - Closing TaskExecutor connection 
container_1553017480503_0158_01_77 because: TaskExecutor exceeded the idle 
timeout.

 

 

2019-04-03 13:48:48,902 INFO  org.apache.flink.yarn.YarnTaskExecutorRunner      
            -     
-Dlog.file=/data1/hadoop/yarn/log/application_1553017480503_0158/container_1553017480503_0158_01_59/taskmanager.log

2019-04-03 14:59:24,677 INFO  
org.apache.parquet.hadoop.InternalParquetRecordWriter         - Flushing mem 
columnStore to file. allocated memory: 109479981

2019-04-03 15:00:05,696 INFO  
org.apache.parquet.hadoop.InternalParquetRecordWriter         - mem size 
135014409 > 134217728: flushing 1930100 records to disk.

2019-04-03 15:00:05,696 INFO  
org.apache.parquet.hadoop.InternalParquetRecordWriter         - Flushing mem 
columnStore to file. allocated memory: 102677684

2019-04-03 15:00:08,671 ERROR org.apache.flink.runtime.operators.BatchTask      
            - Error in task code:  CHAIN Partition -> FlatMap 

org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException: 
Lost connection to task manager 'hadoop9/192.1

Re: [DISCUSS] FLIP-38 Support python language in flink TableAPI

2019-04-03 Thread Thomas Weise
Thanks for putting this proposal together.

It would be nice, if you could share a few use case examples (maybe add
them as section to the FLIP?).

The reason I ask: The table API is immensely useful, but it isn't clear to
me what value other language bindings provide without UDF support. With
FLIP-38 it will be possible to write a program in Python, but not execute
Python functions. Without UDF support, isn't it possible to achieve roughly
the same with plain SQL? In which situation would I use the Python API?

There was related discussion regarding UDF support in [1]. If the
assumption is that such support will be added later, then I would like to
circle back to the question why this cannot be built on top of Beam? It
would be nice to clarify the bigger goal before embarking for the first
milestone.

I'm going to comment on other things in the doc.

[1]
https://lists.apache.org/thread.html/f6f8116b4b38b0b2d70ed45b990d6bb1bcb33611fde6fdf32ec0e840@%3Cdev.flink.apache.org%3E

Thomas


On Wed, Apr 3, 2019 at 12:35 PM Shuyi Chen  wrote:

> Thanks a lot for driving the FLIP, jincheng. The approach looks
> good. Adding multi-lang support sounds a promising direction to expand the
> footprint of Flink. Do we have plan for adding Golang support? As many
> backend engineers nowadays are familiar with Go, but probably not Java as
> much, adding Golang support would significantly reduce their friction to
> use Flink. Also, do we have a design for multi-lang UDF support, and what's
> timeline for adding DataStream API support? We would like to help and
> contribute as well as we do have similar need internally at our company.
> Thanks a lot.
>
> Shuyi
>
> On Tue, Apr 2, 2019 at 1:03 AM jincheng sun 
> wrote:
>
> > Hi All,
> > As Xianda brought up in the previous email, There are a large number of
> > data analysis users who want flink to support Python. At the Flink API
> > level, we have DataStreamAPI/DataSetAPI/TableAPI&SQL, the Table API will
> > become the first-class citizen. Table API is declarative and can be
> > automatically optimized, which is mentioned in the Flink mid-term roadmap
> > by Stephan. So we first considering supporting Python at the Table level
> to
> > cater to the current large number of analytics users. For further promote
> > Python support in flink table level. Dian, Wei and I discussed offline a
> > bit and came up with an initial features outline as follows:
> >
> > - Python TableAPI Interface
> >   Introduce a set of Python Table API interfaces, including interface
> > definitions such as Table, TableEnvironment, TableConfig, etc.
> >
> > - Implementation Architecture
> >   We will offer two alternative architecture options, one for pure Python
> > language support and one for extended multi-language design.
> >
> > - Job Submission
> >   Provide a way that can submit(local/remote) Python Table API jobs.
> >
> > - Python Shell
> >   Python Shell is to provide an interactive way for users to write and
> > execute flink Python Table API jobs.
> >
> >
> > The design document for FLIP-38 can be found here:
> >
> >
> >
> https://docs.google.com/document/d/1ybYt-0xWRMa1Yf5VsuqGRtOfJBz4p74ZmDxZYg3j_h8/edit?usp=sharing
> >
> > I am looking forward to your comments and feedback.
> >
> > Best,
> > Jincheng
> >
>


[jira] [Created] (FLINK-12107) Use Proxy For DataDog Validation

2019-04-03 Thread Luka Jurukovski (JIRA)
Luka Jurukovski created FLINK-12107:
---

 Summary: Use Proxy For DataDog Validation
 Key: FLINK-12107
 URL: https://issues.apache.org/jira/browse/FLINK-12107
 Project: Flink
  Issue Type: Improvement
Reporter: Luka Jurukovski


Recently support for DataDog Metric Proxy was added, however validation for the 
api keys ignores the use of the proxy. There are circumstances in which 
proxying is used due to the fact that the service itself is not reachable 
directly.

Currently the validation pings datadog using the api keys provided.

https://app.datadoghq.com/api/v1/validate?api_key=





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


contributor permission

2019-04-03 Thread 邵志鹏
Hi,


I want to contribute to Apache Flink.
Would you please give me the contributor permission?
My JIRA ID is shaozhipeng.


Thank you very much.

[VOTE] Release 1.8.0, release candidate #5

2019-04-03 Thread Aljoscha Krettek
Hi everyone,
Please review and vote on the release candidate 5 for Flink 1.8.0, as follows:
[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)


The complete staging area is available for your review, which includes:
* JIRA release notes [1],
* the official Apache source release and binary convenience releases to be 
deployed to dist.apache.org [2], which are signed with the key with fingerprint 
F2A67A8047499BBB3908D17AA8F4FD97121D7293 [3],
* all artifacts to be deployed to the Maven Central Repository [4],
* source code tag "release-1.8.0-rc5" [5],
* website pull request listing the new release [6]
* website pull request adding announcement blog post [7].

The vote will be open for at least 72 hours. It is adopted by majority 
approval, with at least 3 PMC affirmative votes.

Thanks,
Aljoscha

[1] 
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12344274
[2] https://dist.apache.org/repos/dist/dev/flink/flink-1.8.0-rc5/
[3] https://dist.apache.org/repos/dist/release/flink/KEYS
[4] https://repository.apache.org/content/repositories/orgapacheflink-1216
[5] 
https://gitbox.apache.org/repos/asf?p=flink.git;a=tag;h=33943b15f9eca3fafbd88b9315cf9f0a083b6381
[6] https://github.com/apache/flink-web/pull/180
[7] https://github.com/apache/flink-web/pull/179

P.S. The difference to the previous RCs is small, you can fetch the tags and do 
a "git log release-1.8.0-rc1..release-1.8.0-rc5” to see the difference in 
commits. Its fixes for the issues that led to the cancellation of the previous 
RCs plus smaller fixes. Most verification/testing that was carried out should 
apply as is to this RC. Any functional verification that you did on previous 
RCs should therefore easily carry over to this one. This does add one new 
feature for the Kafka connector that was on master for quite a while but not on 
release-1.8 because of discussions. The NOTICE-binary file is updated to 
reflect the slight change in ordering; shaded-guava now appears at a different 
part in the dependency tree.

Re: [DISCUSS] FLIP-38 Support python language in flink TableAPI

2019-04-03 Thread jincheng sun
Hi Jeff, thank you for your encouragement and valuable comments on the
google doc.

I saw that you participated in many pySpark discussions a long time ago, I
am very grateful and look forward to your follow-up comments on pyFlink!

Thanks,
Jincheng

Jeff Zhang  于2019年4月2日周二 下午10:53写道:

> Thanks jincheng for driving this. Overall I agree with the approach, just
> left a few comments for details.
>
>
>
> jincheng sun  于2019年4月2日周二 下午4:03写道:
>
> > Hi All,
> > As Xianda brought up in the previous email, There are a large number of
> > data analysis users who want flink to support Python. At the Flink API
> > level, we have DataStreamAPI/DataSetAPI/TableAPI&SQL, the Table API will
> > become the first-class citizen. Table API is declarative and can be
> > automatically optimized, which is mentioned in the Flink mid-term roadmap
> > by Stephan. So we first considering supporting Python at the Table level
> to
> > cater to the current large number of analytics users. For further promote
> > Python support in flink table level. Dian, Wei and I discussed offline a
> > bit and came up with an initial features outline as follows:
> >
> > - Python TableAPI Interface
> >   Introduce a set of Python Table API interfaces, including interface
> > definitions such as Table, TableEnvironment, TableConfig, etc.
> >
> > - Implementation Architecture
> >   We will offer two alternative architecture options, one for pure Python
> > language support and one for extended multi-language design.
> >
> > - Job Submission
> >   Provide a way that can submit(local/remote) Python Table API jobs.
> >
> > - Python Shell
> >   Python Shell is to provide an interactive way for users to write and
> > execute flink Python Table API jobs.
> >
> >
> > The design document for FLIP-38 can be found here:
> >
> >
> >
> https://docs.google.com/document/d/1ybYt-0xWRMa1Yf5VsuqGRtOfJBz4p74ZmDxZYg3j_h8/edit?usp=sharing
> >
> > I am looking forward to your comments and feedback.
> >
> > Best,
> > Jincheng
> >
>
>
> --
> Best Regards
>
> Jeff Zhang
>


Re: [DISCUSS] FLIP-38 Support python language in flink TableAPI

2019-04-03 Thread jincheng sun
Hi Shuyi,

Glad to see your feedback and port more requirements about multi-language!

I think the Flink community is very much looking forward to more language
support, of course, Golang should be in the future support list.
Since the topic of supporting Python on Flink has been researched and
discussed in the community for a long time, and I want to support Python in
the Table API as the first stage, then other languages should be planed to
support. but I do not think more about the detail about how/when support
Golang. And very welcome to share more ideas on how to support Golang if
you have more thoughts. :)

Regarding UDF, we do have some ideas and design attempts. The related
attempts to show the performance of python UDF are not optimistic. And
there are also some problems with Python environment management should be
considered. After we have more investigations and experiments, I will share
the discussion with you in time. Perhaps after the first stage(Python
TableAPI support), We will then discuss the detailed discussion of UDF
support.

I think the support of the DataStream API should be considered after
supporting UDFs because DataStream is mostly supported by various
functions.

We plan to complete the first phase before the release of Flink-1.9, and
start the UDF support after 1.9. Of course,  I am very glad to hear that
you want to contribute to the Flink multi-language support. I believe,
nothing is impossible if more people interest in Python Table API with UDF
support and more people want to contribute community more, UDF may be there
when flink-1.9 release. :)

Best,
Jincheng

Shuyi Chen  于2019年4月4日周四 上午3:35写道:

> Thanks a lot for driving the FLIP, jincheng. The approach looks
> good. Adding multi-lang support sounds a promising direction to expand the
> footprint of Flink. Do we have plan for adding Golang support? As many
> backend engineers nowadays are familiar with Go, but probably not Java as
> much, adding Golang support would significantly reduce their friction to
> use Flink. Also, do we have a design for multi-lang UDF support, and what's
> timeline for adding DataStream API support? We would like to help and
> contribute as well as we do have similar need internally at our company.
> Thanks a lot.
>
> Shuyi
>
> On Tue, Apr 2, 2019 at 1:03 AM jincheng sun 
> wrote:
>
> > Hi All,
> > As Xianda brought up in the previous email, There are a large number of
> > data analysis users who want flink to support Python. At the Flink API
> > level, we have DataStreamAPI/DataSetAPI/TableAPI&SQL, the Table API will
> > become the first-class citizen. Table API is declarative and can be
> > automatically optimized, which is mentioned in the Flink mid-term roadmap
> > by Stephan. So we first considering supporting Python at the Table level
> to
> > cater to the current large number of analytics users. For further promote
> > Python support in flink table level. Dian, Wei and I discussed offline a
> > bit and came up with an initial features outline as follows:
> >
> > - Python TableAPI Interface
> >   Introduce a set of Python Table API interfaces, including interface
> > definitions such as Table, TableEnvironment, TableConfig, etc.
> >
> > - Implementation Architecture
> >   We will offer two alternative architecture options, one for pure Python
> > language support and one for extended multi-language design.
> >
> > - Job Submission
> >   Provide a way that can submit(local/remote) Python Table API jobs.
> >
> > - Python Shell
> >   Python Shell is to provide an interactive way for users to write and
> > execute flink Python Table API jobs.
> >
> >
> > The design document for FLIP-38 can be found here:
> >
> >
> >
> https://docs.google.com/document/d/1ybYt-0xWRMa1Yf5VsuqGRtOfJBz4p74ZmDxZYg3j_h8/edit?usp=sharing
> >
> > I am looking forward to your comments and feedback.
> >
> > Best,
> > Jincheng
> >
>


[jira] [Created] (FLINK-12108) Simplify splitting expressions into projections, aggregations & windowProperties

2019-04-03 Thread Dawid Wysakowicz (JIRA)
Dawid Wysakowicz created FLINK-12108:


 Summary: Simplify splitting expressions into projections, 
aggregations & windowProperties
 Key: FLINK-12108
 URL: https://issues.apache.org/jira/browse/FLINK-12108
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / API
Reporter: Dawid Wysakowicz
Assignee: Dawid Wysakowicz






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)