[jira] [Created] (FLINK-12093) Apache Flink:Active MQ consumer job is getting finished after first message consume.
shiv kumar created FLINK-12093: -- Summary: Apache Flink:Active MQ consumer job is getting finished after first message consume. Key: FLINK-12093 URL: https://issues.apache.org/jira/browse/FLINK-12093 Project: Flink Issue Type: Bug Components: API / DataStream Affects Versions: 1.7.2 Environment: Working in my local IDE(Eclipse). Reporter: shiv kumar Hi Team, Below is my the code the the execution environment to run the Apache Flink job that's consume message from ActiveMQ topic:: StreamExecutionEnvironment env = createExecutionEnvironment(); connectionFactory = new ActiveMQConnectionFactory("**", "**.", "failover:(tcp://amq-master-01:61668)?timeout=3000"); LOG.info("exceptionListener{}", new AMQExceptionListLocal(LOG, true)); RunningChecker runningChecker = new RunningChecker(); runningChecker.setIsRunning(true); AMQSourceConfig config = new AMQSourceConfig.AMQSourceConfigBuilder() .setConnectionFactory(connectionFactory).setDestinationName("test_flink") .setDeserializationSchema(deserializationSchema).setRunningChecker(runningChecker) .setDestinationType(DestinationType.TOPIC).build(); amqSource = new AMQSourceLocal<>(config); LOG.info("Check whether ctx is null ::{}", amqSource); DataStream dataMessage = env.addSource(amqSource); dataMessage.writeAsText("C:/Users/shivkumar/Desktop/flinksjar/output.txt", WriteMode.OVERWRITE); System.out.println("Step 1"); env.execute("Check ACTIVE_MQ"); When we are starting the job, Topic is getting created and message is getting dequeued from that topic. But After that is getting finished. What Can be done to keep the job running? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-12094) Introduce sort merge join operator to blink batch
Jingsong Lee created FLINK-12094: Summary: Introduce sort merge join operator to blink batch Key: FLINK-12094 URL: https://issues.apache.org/jira/browse/FLINK-12094 Project: Flink Issue Type: New Feature Components: Table SQL / Runtime Reporter: Jingsong Lee Assignee: Jingsong Lee Introduce SortMergeJoinOperator: An implementation that realizes the joining through a sort-merge join strategy. Support all sql types: INNER, LEFT, RIGHT, FULL, SEMI, ANTI 1.In most cases, its performance is weaker than HashJoin. 2.It is more stable than HashJoin, and most of the data can be sorted stably. 3.SortMergeJoin should be the best choice if sort can be omitted in the case of multi-level join cascade with the same key. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: [Proposal] Shuffle resources lifecycle management
Hi Andrey, Thanks for concluding this valuable proposal. It really extends and enhances the details of partition lifecycle management ever mentioned in FLIP-31. I reviewed the google doc and agreed the general points in it. One implict point in doc is that the shuffle is not limted to only serve for downstream and upstream in one job, even the data shuffle is required across jobs like interactive feature, so the ShuffleMaster is not coupled with JobMaster. These thoughts are also consitent with our previous confirmation and we could fight in this direction in long term. We would further focus on them in specific PRs considering the feasibility in next 1.9 release. Best, Zhijiang -- From:Andrey Zagrebin Send Time:2019年3月29日(星期五) 18:38 To:dev Subject:[Proposal] Shuffle resources lifecycle management Hi All, While working on pluggable shuffle architecture, looking into interactive programming and fine-grained recovery efforts, we released that lifecycle management of intermediate result partitions needs more detailed discussion to enable envisioned use cases. Here I prepared a design document to address this concern. The document proposes certain extensions to FLIP-31 (Design of Pluggable Shuffle Service): https://docs.google.com/document/d/13vAJJxfRXAwI4MtO8dux8hHnNMw2Biu5XRrb_hvGehA Looking forward to your feedback. Thanks, Andrey
[jira] [Created] (FLINK-12095) Canceling vs cancelling
Mike Pedersen created FLINK-12095: - Summary: Canceling vs cancelling Key: FLINK-12095 URL: https://issues.apache.org/jira/browse/FLINK-12095 Project: Flink Issue Type: Bug Components: Documentation Reporter: Mike Pedersen British and American English is mixed in some places, leading to confusion between "canceled" and "cancelled", and "canceling" and "cancelling". For example here, it is incorrectly called "cancelling" here: https://ci.apache.org/projects/flink/flink-docs-stable/internals/job_scheduling.html Likewise in the state diagram of the ExecutionState itself: https://github.com/apache/flink/blob/8ed85fe49b7595546a8f968e0faa1fa7d4da47ec/flink-runtime/src/main/java/org/apache/flink/runtime/execution/ExecutionState.java#L31 Same for the /jobs REST schema. It is unclear if this is due to incorrect documentation or a misspelling in the JSON output i.e. if the schema is wrong or correct, but misspelled. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Apply for JIRA permission
You now have contributor permissions. On 03/04/2019 08:18, Hefei Li wrote: Hi guys, I want to contribute to Apache Flink. Would you please give me the permission as a contributor ? My JIRA username is *lhfei *. === Best Regards Hefei LiHefei Li MP: +86 18701581473 MSN: lh...@live.cn ===
[jira] [Created] (FLINK-12096) Flink counter state getting stuck on usin
Yogesh Nandigamq created FLINK-12096: Summary: Flink counter state getting stuck on usin Key: FLINK-12096 URL: https://issues.apache.org/jira/browse/FLINK-12096 Project: Flink Issue Type: Bug Components: Runtime / State Backends Reporter: Yogesh Nandigamq Working on a job to dynamically create counters for a keyed stream using counter value state and FileSystem State Backend. It was observed that the counters are stuck after a few records which is most probably because of the counter value state not getting updated. FS State backend was used because the counter value state was not getting updated from the first record itself when using RocksDBS State Backend. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-12097) Flink SQL hangs in StreamTableEnvironment.sqlUpdate, keeps executing and seems never stop, finally lead to java.lang.OutOfMemoryError: GC overhead limit exceeded
xu created FLINK-12097: -- Summary: Flink SQL hangs in StreamTableEnvironment.sqlUpdate, keeps executing and seems never stop, finally lead to java.lang.OutOfMemoryError: GC overhead limit exceeded Key: FLINK-12097 URL: https://issues.apache.org/jira/browse/FLINK-12097 Project: Flink Issue Type: Bug Components: Table SQL / Planner Affects Versions: 1.7.2 Reporter: xu Attachments: DSL.txt Hi Experts, There is a Flink application(Version 1.7.2) which is written in Flink SQL, and the SQL in the application is quite long, consists of about 10 tables, 1500 lines in total. When executing, I found it is hanged in StreamTableEnvironment.sqlUpdate, keep executing some code about calcite and the memory usage keeps grown up, after several minutes java.lang.OutOfMemoryError: GC overhead limit exceeded is got. I get some thread dumps: at org.apache.calcite.plan.volcano.RuleQueue.popMatch(RuleQueue.java:475) at org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:640) at org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:339) at org.apache.flink.table.api.TableEnvironment.runVolcanoPlanner(TableEnvironment.scala:373) at org.apache.flink.table.api.TableEnvironment.optimizeLogicalPlan(TableEnvironment.scala:292) at org.apache.flink.table.api.StreamTableEnvironment.optimize(StreamTableEnvironment.scala:812) at org.apache.flink.table.api.StreamTableEnvironment.translate(StreamTableEnvironment.scala:860) at org.apache.flink.table.api.StreamTableEnvironment.writeToSink(StreamTableEnvironment.scala:344) at org.apache.flink.table.api.TableEnvironment.insertInto(TableEnvironment.scala:879) at org.apache.flink.table.api.TableEnvironment.sqlUpdate(TableEnvironment.scala:817) at org.apache.flink.table.api.TableEnvironment.sqlUpdate(TableEnvironment.scala:777) at java.io.PrintWriter.write(PrintWriter.java:473) at org.apache.calcite.rel.AbstractRelNode$1.explain_(AbstractRelNode.java:415) at org.apache.calcite.rel.externalize.RelWriterImpl.done(RelWriterImpl.java:156) at org.apache.calcite.rel.AbstractRelNode.explain(AbstractRelNode.java:312) at org.apache.calcite.rel.AbstractRelNode.computeDigest(AbstractRelNode.java:420) at org.apache.calcite.rel.AbstractRelNode.recomputeDigest(AbstractRelNode.java:356) at org.apache.calcite.rel.AbstractRelNode.onRegister(AbstractRelNode.java:350) at org.apache.calcite.plan.volcano.VolcanoPlanner.registerImpl(VolcanoPlanner.java:1484) at org.apache.calcite.plan.volcano.VolcanoPlanner.register(VolcanoPlanner.java:859) at org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:879) at org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:1755) at org.apache.calcite.plan.volcano.VolcanoRuleCall.transformTo(VolcanoRuleCall.java:135) at org.apache.calcite.plan.RelOptRuleCall.transformTo(RelOptRuleCall.java:234) at org.apache.calcite.rel.convert.ConverterRule.onMatch(ConverterRule.java:141) at org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:212) at org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:646) at org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:339) at org.apache.flink.table.api.TableEnvironment.runVolcanoPlanner(TableEnvironment.scala:373) at org.apache.flink.table.api.TableEnvironment.optimizeLogicalPlan(TableEnvironment.scala:292) at org.apache.flink.table.api.StreamTableEnvironment.optimize(StreamTableEnvironment.scala:812) at org.apache.flink.table.api.StreamTableEnvironment.translate(StreamTableEnvironment.scala:860) at org.apache.flink.table.api.StreamTableEnvironment.writeToSink(StreamTableEnvironment.scala:344) at org.apache.flink.table.api.TableEnvironment.insertInto(TableEnvironment.scala:879) at org.apache.flink.table.api.TableEnvironment.sqlUpdate(TableEnvironment.scala:817) at org.apache.flink.table.api.TableEnvironment.sqlUpdate(TableEnvironment.scala:777) *Both point to some code about calcite.* And also I get the heap dump, found that there are *5703373 RexCall instances, and 5909525 String instances, 5909692 char[] instances ,**size is 6.8G*. I wonder why there are so many RexCall instances here, why it keeps on executing some calcite code and seems never stop. |char[]| |5,909,692 (16.4%)|6,873,430,938 (84.3%)| |java.lang.String| |5,909,525 (16.4%)|165,466,700 (2%)| |org.apache.calcite.rex.RexLocalRef| |5,901,313 (16.4%)|259,657,772 (3.2%)| |org.apache.flink.calcite.shaded.com.google.common.collect.RegularImmutableList| |5,739,479 (15.9%)
[DISCUSS] Drop Elasticssearch 1 connector
Hello everyone, I'm proposing to remove the connector for elasticsearch 1. The connector is used significantly less than more recent versions (2&5 are downloaded 4-5x more), and hasn't seen any development for over a hear, yet still incurred maintenance overhead due to licensing and testing.
[jira] [Created] (FLINK-12098) Add support for generating optimized logical plan for simple group aggregate on batch
godfrey he created FLINK-12098: -- Summary: Add support for generating optimized logical plan for simple group aggregate on batch Key: FLINK-12098 URL: https://issues.apache.org/jira/browse/FLINK-12098 Project: Flink Issue Type: New Feature Components: Table SQL / Planner Reporter: godfrey he Assignee: godfrey he This issue only involves simple stream group aggregate, complex group aggregate(including GROUP SETS and DISTINCT) will also be finished in new issue. A logical aggregate will be converted to StreamExecGroupAggregate, and a StreamExecGroupAggregate will be written to two-stage aggregates (StreamExecLocalGroupAggregate and StreamExecGlobalGroupAggregate) if mini-batch is enabled and all aggregate functions are mergeable. Retraction rules will also be ported, because multiple level aggregates will produce retraction message. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-12099) Elasticsearch(1)SinkITCase fails on Java 9
Chesnay Schepler created FLINK-12099: Summary: Elasticsearch(1)SinkITCase fails on Java 9 Key: FLINK-12099 URL: https://issues.apache.org/jira/browse/FLINK-12099 Project: Flink Issue Type: Sub-task Components: Connectors / ElasticSearch Affects Versions: 1.9.0 Reporter: Chesnay Schepler Fix For: 1.9.0 The ES1 code fails while gathering information about the JVM. Since the code in question is executed in a static initializer block I don't a way to prevent this somehow from our side. {code} java.lang.ExceptionInInitializerError at org.elasticsearch.node.internal.InternalNode.(InternalNode.java:143) at org.elasticsearch.node.NodeBuilder.build(NodeBuilder.java:159) at org.elasticsearch.node.NodeBuilder.node(NodeBuilder.java:166) at org.apache.flink.streaming.connectors.elasticsearch.EmbeddedElasticsearchNodeEnvironmentImpl.start(EmbeddedElasticsearchNodeEnvironmentImpl.java:46) at org.apache.flink.streaming.connectors.elasticsearch.ElasticsearchSinkTestBase.prepare(ElasticsearchSinkTestBase.java:72) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:564) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) at org.junit.rules.RunRules.evaluate(RunRules.java:20) at org.junit.runners.ParentRunner.run(ParentRunner.java:363) at org.junit.runner.JUnitCore.run(JUnitCore.java:137) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47) at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242) at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70) Caused by: java.lang.UnsupportedOperationException: Boot class path mechanism is not supported at java.management/sun.management.RuntimeImpl.getBootClassPath(RuntimeImpl.java:99) at org.elasticsearch.monitor.jvm.JvmInfo.(JvmInfo.java:77) ... 24 more {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-12100) Kafka 0.10 tests fail on Java 9
Chesnay Schepler created FLINK-12100: Summary: Kafka 0.10 tests fail on Java 9 Key: FLINK-12100 URL: https://issues.apache.org/jira/browse/FLINK-12100 Project: Flink Issue Type: Sub-task Components: Connectors / Kafka Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.9.0 {code} java.lang.NoClassDefFoundError: javax/xml/bind/DatatypeConverter at kafka.utils.CoreUtils$.urlSafeBase64EncodeNoPadding(CoreUtils.scala:294) at kafka.utils.CoreUtils$.generateUuidAsBase64(CoreUtils.scala:282) at kafka.server.KafkaServer$$anonfun$getOrGenerateClusterId$1.apply(KafkaServer.scala:335) at kafka.server.KafkaServer$$anonfun$getOrGenerateClusterId$1.apply(KafkaServer.scala:335) at scala.Option.getOrElse(Option.scala:121) at kafka.server.KafkaServer.getOrGenerateClusterId(KafkaServer.scala:335) at kafka.server.KafkaServer.startup(KafkaServer.scala:190) at org.apache.flink.streaming.connectors.kafka.KafkaTestEnvironmentImpl.getKafkaServer(KafkaTestEnvironmentImpl.java:430) at org.apache.flink.streaming.connectors.kafka.KafkaTestEnvironmentImpl.prepare(KafkaTestEnvironmentImpl.java:256) at org.apache.flink.streaming.connectors.kafka.KafkaTestBase.startClusters(KafkaTestBase.java:137) at org.apache.flink.streaming.connectors.kafka.KafkaTestBase.prepare(KafkaTestBase.java:100) at org.apache.flink.streaming.connectors.kafka.KafkaTestBase.prepare(KafkaTestBase.java:92) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:564) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) at org.junit.rules.RunRules.evaluate(RunRules.java:20) at org.junit.runners.ParentRunner.run(ParentRunner.java:363) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) Caused by: java.lang.ClassNotFoundException: javax.xml.bind.DatatypeConverter at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:582) at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:185) at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:496) ... 33 more {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-12101) Race condition when concurrenty running uploaded jars via REST
Maximilian Michels created FLINK-12101: -- Summary: Race condition when concurrenty running uploaded jars via REST Key: FLINK-12101 URL: https://issues.apache.org/jira/browse/FLINK-12101 Project: Flink Issue Type: Bug Components: Runtime / REST Affects Versions: 1.7.2, 1.6.4 Reporter: Maximilian Michels Flink enables to upload and run Jars via REST. When multiple uploaded jars are invoked interactively to generate the JobGraph, the static initialization of the {{ContextEnvironment}}, when calls are interleaved, will override each other and produce a local execution of the jar. The local execution uses an incorrect class loader and throws an exception like this: {noformat} 2019-04-02 14:25:05,549 ERROR- Failed to create job graph java.lang.RuntimeException: Pipeline execution failed at org.apache.beam.runners.flink.FlinkRunner.run(FlinkRunner.java:117) at org.apache.beam.sdk.Pipeline.run(Pipeline.java:313) at org.apache.beam.sdk.Pipeline.run(Pipeline.java:299) at at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:529) at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:421) at org.apache.flink.client.program.OptimizerPlanEnvironment.getOptimizedPlan(OptimizerPlanEnvironment.java:83) at org.apache.flink.client.program.PackagedProgramUtils.createJobGraph(PackagedProgramUtils.java:78) at org.apache.flink.client.program.PackagedProgramUtils.createJobGraph(PackagedProgramUtils.java:120) at org.apache.flink.runtime.webmonitor.handlers.utils.JarHandlerUtils$JarHandlerContext.toJobGraph(JarHandlerUtils.java:117) at org.apache.flink.runtime.webmonitor.handlers.JarRunHandler.lambda$getJobGraphAsync$7(JarRunHandler.java:151) at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.flink.runtime.client.JobExecutionException: Job execution failed. at org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:146) at org.apache.flink.runtime.minicluster.MiniCluster.executeJobBlocking(MiniCluster.java:647) at org.apache.flink.streaming.api.environment.LocalStreamEnvironment.execute(LocalStreamEnvironment.java:123) at org.apache.beam.runners.flink.FlinkPipelineExecutionEnvironment.executePipeline(FlinkPipelineExecutionEnvironment.java:125) at org.apache.beam.runners.flink.FlinkRunner.run(FlinkRunner.java:114) ... 18 more Caused by: org.apache.flink.streaming.runtime.tasks.StreamTaskException: Could not instantiate outputs in order. at org.apache.flink.streaming.api.graph.StreamConfig.getOutEdgesInOrder(StreamConfig.java:398) at org.apache.flink.streaming.runtime.tasks.StreamTask.createStreamRecordWriters(StreamTask.java:1164) at org.apache.flink.streaming.runtime.tasks.StreamTask.(StreamTask.java:212) at org.apache.flink.streaming.runtime.tasks.StreamTask.(StreamTask.java:190) at org.apache.flink.streaming.runtime.tasks.SourceStreamTask.(SourceStreamTask.java:51) at org.apache.flink.streaming.runtime.tasks.StoppableSourceStreamTask.(StoppableSourceStreamTask.java:39) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.flink.runtime.taskmanager.Task.loadAndInstantiateInvokable(Task.java:1398) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:682) ... 1 more Caused by: java.lang.ClassNotFoundException: org.apache.beam.runners.flink.translation.wrappers.streaming.WorkItemKeySelector at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at org.apache.flink.runtime.execution.librarycache.FlinkUserCodeClassLoaders$ChildFirstClassLoader.loadClass(FlinkUserCodeClassLoaders.java:129) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at org.apache.flink.util.InstantiationUtil$Cl
[jira] [Created] (FLINK-12102) FlinkILoopTest fails on Java 9
Chesnay Schepler created FLINK-12102: Summary: FlinkILoopTest fails on Java 9 Key: FLINK-12102 URL: https://issues.apache.org/jira/browse/FLINK-12102 Project: Flink Issue Type: Sub-task Components: Scala Shell Reporter: Chesnay Schepler Fix For: 1.9.0 {code} java.lang.NullPointerException at org.apache.flink.api.java.FlinkILoopTest.testConfigurationForwarding(FlinkILoopTest.java:89) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-12103) Extend builtin operator metrics with user defined scope and variables
Ben Marini created FLINK-12103: -- Summary: Extend builtin operator metrics with user defined scope and variables Key: FLINK-12103 URL: https://issues.apache.org/jira/browse/FLINK-12103 Project: Flink Issue Type: Wish Components: Runtime / Metrics Affects Versions: 1.7.2 Reporter: Ben Marini I'm looking for a way to extend the builtin throughput and latency metrics for operators with my own metric variables. My specific use case: I have a job that defines a list of independent source -> sink streams. I would like to add my own metric variables to each of these independent streams. For example, something like this: {code:scala} class MyFilter extends RichFilterFunction { override def open(parameters: Configuration): Unit = { val mg = getRuntimeContext.getMetricGroup // Includes "streamName" -> "A|B" // Init some user defined metrics here... } } // Stream A // Native operator metrics and user defined metrics in rich operators include "streamName" -> "A" streamA = env.withMetricGroup((mg) => mg.addGroup("streamName", "A").addSource(...).filter(new MyFilter).addSink(...) // Stream B // Native operator metrics and user defined metrics in rich operators include "streamName" -> "B" streamB = env.withMetricGroup((mg) => mg.addGroup("streamName", "B").addSource(...).filter(new MyFilter).addSink(...) {code} Is this possible? Would a new hook into StreamTransformation have to be added? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-12104) Flink Kafka fails with Incompatible KafkaProducer version / NoSuchFieldException sequenceNumbers
Tim created FLINK-12104: --- Summary: Flink Kafka fails with Incompatible KafkaProducer version / NoSuchFieldException sequenceNumbers Key: FLINK-12104 URL: https://issues.apache.org/jira/browse/FLINK-12104 Project: Flink Issue Type: Bug Components: Connectors / Kafka Affects Versions: 1.7.2 Reporter: Tim FlinkKafkaProducer (in flink-connector-kafka-0.11) tries to access a field named `sequenceNumbers` from the KafkaProducer's TransactionManager. You can find this line on the [master branch here|[https://github.com/apache/flink/blob/d6be68670e661091d94a3c65a2704d52fc0e827c/flink-connectors/flink-connector-kafka-0.11/src/main/java/org/apache/flink/streaming/connectors/kafka/internal/FlinkKafkaProducer.java#L197].] {code:java} Object transactionManager = getValue(kafkaProducer, "transactionManager"); ... Object sequenceNumbers = getValue(transactionManager, "sequenceNumbers"); {code} However, the Kafka TransactionManager no longer has a "sequenceNumbers" field. This was changed back on 9/14/2017 (KAFKA-5494) in an effort to support multiple inflight requests while still guaranteeing idempotence. See [commit diff here|[https://github.com/apache/kafka/commit/5d2422258cb975a137a42a4e08f03573c49a387e#diff-f4ef1afd8792cd2a2e9069cd7ddea630].] Subsequently when Flink tries to "recoverAndCommit" (see FlinkKafkaProducer011) it fails with a "NoSuchFieldException: sequenceNumbers", followed by a "Incompatible KafkaProducer version". Given that the KafkaProducer used is so old (this change was made almost two years ago) are there any plans of upgrading? Or - are there some known compatibility issues that prevent Flink/Kafka connector from doing so? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-12105) TUMBLE INTERVAL value errors out for 100 or more value
Vinod Mehra created FLINK-12105: --- Summary: TUMBLE INTERVAL value errors out for 100 or more value Key: FLINK-12105 URL: https://issues.apache.org/jira/browse/FLINK-12105 Project: Flink Issue Type: Bug Components: Table SQL / Client Affects Versions: 1.7.2 Environment: [https://github.com/ververica/sql-training] Reporter: Vinod Mehra I ran into this while experimenting with different values at Lyft eng. However it is reproducible with [https://github.com/ververica/sql-training] as well. I showed this issue to the training instructors during flink-forward-19 and they asked me to file this bug. The INTERVAL values work fine until 99. Errors after that: *TUMBLE(rideTime, INTERVAL '100' SECOND)* _org.apache.calcite.sql.validate.SqlValidatorException: Interval field value 100 exceeds precision of SECOND(2) field_ *TUMBLE(rideTime, INTERVAL '100' MINUTE)* _org.apache.calcite.sql.validate.SqlValidatorException: Interval field value 100 exceeds precision of MINUTE(2) field_ *TUMBLE(rideTime, INTERVAL '100' HOUR)* _org.apache.calcite.sql.validate.SqlValidatorException: Interval field value 100 exceeds precision of HOUR(2) field_ *TUMBLE(rideTime, INTERVAL '100' DAY)* Interval field value 100 exceeds precision of DAY(2) field (Note: MONTH AND YEAR also error out but for different reasons ("_Only constant window intervals with millisecond resolution are supported_"). MONTH and YEAR intervals are not supported at all currently. I was told that it is hard to implement because of timezone differences. I will file that separately.)_ _ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: [DISCUSS] FLIP-38 Support python language in flink TableAPI
Thanks a lot for driving the FLIP, jincheng. The approach looks good. Adding multi-lang support sounds a promising direction to expand the footprint of Flink. Do we have plan for adding Golang support? As many backend engineers nowadays are familiar with Go, but probably not Java as much, adding Golang support would significantly reduce their friction to use Flink. Also, do we have a design for multi-lang UDF support, and what's timeline for adding DataStream API support? We would like to help and contribute as well as we do have similar need internally at our company. Thanks a lot. Shuyi On Tue, Apr 2, 2019 at 1:03 AM jincheng sun wrote: > Hi All, > As Xianda brought up in the previous email, There are a large number of > data analysis users who want flink to support Python. At the Flink API > level, we have DataStreamAPI/DataSetAPI/TableAPI&SQL, the Table API will > become the first-class citizen. Table API is declarative and can be > automatically optimized, which is mentioned in the Flink mid-term roadmap > by Stephan. So we first considering supporting Python at the Table level to > cater to the current large number of analytics users. For further promote > Python support in flink table level. Dian, Wei and I discussed offline a > bit and came up with an initial features outline as follows: > > - Python TableAPI Interface > Introduce a set of Python Table API interfaces, including interface > definitions such as Table, TableEnvironment, TableConfig, etc. > > - Implementation Architecture > We will offer two alternative architecture options, one for pure Python > language support and one for extended multi-language design. > > - Job Submission > Provide a way that can submit(local/remote) Python Table API jobs. > > - Python Shell > Python Shell is to provide an interactive way for users to write and > execute flink Python Table API jobs. > > > The design document for FLIP-38 can be found here: > > > https://docs.google.com/document/d/1ybYt-0xWRMa1Yf5VsuqGRtOfJBz4p74ZmDxZYg3j_h8/edit?usp=sharing > > I am looking forward to your comments and feedback. > > Best, > Jincheng >
[CANCEL][VOTE] Release 1.8.0, release candidate #4
I’m hereby canceling the vote for Flink 1.8.0 RC4 in favour of a new RC that I will create shortly now that the blockers are resolved. > On 1. Apr 2019, at 13:21, Till Rohrmann wrote: > > Thanks for reporting this problem and opening a JIRA issue. I've created a > fix for the problem [1]. > > [1] https://github.com/apache/flink/pull/8096 > > Cheers, > Till > > On Mon, Apr 1, 2019 at 12:30 AM Richard Deurwaarder wrote: > >> Hello @Aljoscha and @Rong, >> >> I've described the problem in the mailing list[1] and on stackoverflow[2] >> before. But the gist is: If there's a firewall between the yarn cluster and >> the machine submitting the job, we need to be able to set a fixed port (or >> range of ports) for REST communication with the jobmanager. >> >> It is a regression in the sense that on 1.5 (and 1.6 I believe?) it was >> possible to work around this by using the legacy mode (non flip-6), but on >> 1.7 and now 1.8 this is not possible. >> >> I've created FLINK-12075 < >> https://issues.apache.org/jira/browse/FLINK-12075> >> for it, I have not made it blocking yet as it is not strictly a regression >> with regards to 1.7. Perhaps you guys can better determine if you want this >> added in RC5. >> >> Regards, >> >> Richard >> >> [1] >> >> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Submitting-job-to-Flink-on-yarn-timesout-on-flip-6-1-5-x-td26199.html#a26383 >> [2] https://stackoverflow.com/q/54771637/988324 >> >> On Sat, Mar 30, 2019 at 7:24 PM Rong Rong wrote: >> >>> Hi @Aljoscha, >>> >>> Based on the previous commit [1] that adds the random port selection >> code, >>> it seems like the important part is to unset whatever 'rest.port' setting >>> previously done. I don't think the current way of setting the BIND_PORT >>> actually overrides any existing PORT setting. However, I wasn't able to >>> find any test that is related, maybe @Till can provide more insight here? >>> >>> Maybe @Richard can provide more detail on the YARN run command used to >>> reproduce the problem? >>> >>> Thanks, >>> Rong >>> >>> [1] >>> >>> >> https://github.com/apache/flink/commit/dbe0e8286d76a5facdb49589b638b87dbde80178#diff-487838863ab693af7008f04cb3359be3R117 >>> >>> On Sat, Mar 30, 2019 at 5:51 AM Aljoscha Krettek >>> wrote: >>> @Richard Did this work for you previously? From the change, it seems >> that the port was always set to 0 on YARN even before. > On 28. Mar 2019, at 16:13, Richard Deurwaarder >>> wrote: > > -1 (non-binding) > > - Ran integration tests locally (1000+) of our flink job, all >>> succeeded. > - Attempted to run job on hadoop, failed. It failed because we have a > firewall in place and we cannot set the rest port to a specific >>> port/port > range. > Unless I am mistaken, it seems like FLINK-11081 broke the possibility >>> of > setting a REST port when running on yarn ( > >>> >> https://github.com/apache/flink/commit/730eed71ef3f718d61f85d5e94b1060844ca56db#diff-487838863ab693af7008f04cb3359be3R102 > ) > Code-wise it seems rather straightforward to fix but I am unsure >> about the > reason why this is hard-coded to 0 and what the impact would be. > > It would benefit us greatly if a fix for this could make it to 1.8.0. > > Regards, > > Richard > > On Thu, Mar 28, 2019 at 9:54 AM Tzu-Li (Gordon) Tai < >>> tzuli...@apache.org > > wrote: > >> +1 (binding) >> >> Functional checks: >> >> - Built Flink from source (`mvn clean verify`) locally, with success >> - Ran end-to-end tests locally for 5 times in a loop, no attempts >>> failed >> (Hadoop 2.8.4, Scala 2.12) >> - Manually tested state schema evolution for POJO. Besides the tests that >> @Congxian already did, additionally tested evolution cases with POJO >> subclasses + non-registered POJOs. >> - Manually tested migration of Scala stateful jobs that use case classes / >> Scala collections as state types, performing the migration across >>> Scala >> 2.11 to Scala 2.12. >> - Reviewed release announcement PR >> >> Misc / legal checks: >> >> - checked checksums and signatures >> - No binaries in source distribution >> - Staging area does not seem to have any missing artifacts >> >> Cheers, >> Gordon >> >> On Thu, Mar 28, 2019 at 4:52 PM Tzu-Li (Gordon) Tai < tzuli...@apache.org> >> wrote: >> >>> @Shaoxuan >>> >>> The drop in the serializerAvro benchmark, as explained earlier in >> previous >>> voting threads of earlier RCs, was due to a slower job >> initialization >> phase >>> caused by slower deserialization of the AvroSerializer. >>> Piotr also pointed out that after the number of records was >> increased in >>> the serializer benchmarks, this drop was no longer observable >> before >>> / >>> after
[jira] [Created] (FLINK-12106) Jobmanager is killing FINISHED taskmanger containers, causing exception in still running Taskmanagers an
John created FLINK-12106: Summary: Jobmanager is killing FINISHED taskmanger containers, causing exception in still running Taskmanagers an Key: FLINK-12106 URL: https://issues.apache.org/jira/browse/FLINK-12106 Project: Flink Issue Type: Bug Components: Deployment / YARN Affects Versions: 1.7.2 Environment: Hadoop: hdp/2.5.6.0-40 Flink: 2.7.2 Reporter: John When running a single flink job on YARN, some of the taskmanger containers reach the FINISHED state before others. It appears that, after receiving final execution state FINISHED from a taskmanager, jobmanager is waiting ~68 seconds and then freeing the associated slot in the taskmanager. After and additional 60 seconds, jobmanager is stopping the same taskmanger because TaskExecutor exceeded the idle timeout. Meanwhile, other taskmangers are still working to complete the job. Within 10 seconds after the taskmanger container above is stopped, the remaining task managers receive an exception due to loss of connection to the stopped taskmanager. These exceptions result job failure. Relevant logs: 2019-04-03 13:49:00,013 INFO org.apache.flink.yarn.YarnResourceManager - Registering TaskManager with ResourceID container_1553017480503_0158_01_38 (akka.tcp://flink@hadoop4:42745/user/taskmanager_0) at ResourceManager 2019-04-03 13:49:05,900 INFO org.apache.flink.yarn.YarnResourceManager - Registering TaskManager with ResourceID container_1553017480503_0158_01_59 (akka.tcp://flink@hadoop9:55042/user/taskmanager_0) at ResourceManager 2019-04-03 13:48:51,132 INFO org.apache.flink.yarn.YarnResourceManager - Received new container: container_1553017480503_0158_01_77 - Remaining pending container requests: 6 2019-04-03 13:48:52,862 INFO org.apache.flink.yarn.YarnTaskExecutorRunner - -Dlog.file=/hadoop/yarn/log/application_1553017480503_0158/container_1553017480503_0158_01_77/taskmanager.log 2019-04-03 13:48:57,490 INFO org.apache.flink.runtime.io.network.netty.NettyServer - Successful initialization (took 202 ms). Listening on SocketAddress /192.168.230.69:40140. 2019-04-03 13:49:12,575 INFO org.apache.flink.yarn.YarnResourceManager - Registering TaskManager with ResourceID container_1553017480503_0158_01_77 (akka.tcp://flink@hadoop9:51525/user/taskmanager_0) at ResourceManager 2019-04-03 13:49:12,631 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Allocated slot for AllocationID\{42fed3e5a136240c23cc7b394e3249e9}. 2019-04-03 14:58:15,188 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and sending final execution state FINISHED to JobManager for task DataSink (com.anovadata.alexflinklib.sinks.bucketing.BucketingOutputFormat@26874f2c) a4b5fb32830d4561147b2714828109e2. 2019-04-03 14:59:23,049 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPool - Releasing idle slot [AllocationID\{42fed3e5a136240c23cc7b394e3249e9}]. 2019-04-03 14:59:23,058 INFO org.apache.flink.runtime.taskexecutor.slot.TaskSlotTable - Free slot TaskSlot(index:0, state:ACTIVE, resource profile: ResourceProfile\{cpuCores=1.7976931348623157E308, heapMemoryInMB=2147483647, directMemoryInMB=2147483647, nativeMemoryInMB=2147483647, networkMemoryInMB=2147483647}, allocationId: AllocationID\{42fed3e5a136240c23cc7b394e3249e9}, jobId: a6c4e367698c15cdf168d19a89faff1d). 2019-04-03 15:00:02,641 INFO org.apache.flink.yarn.YarnResourceManager - Stopping container container_1553017480503_0158_01_77. 2019-04-03 15:00:02,646 INFO org.apache.flink.yarn.YarnResourceManager - Closing TaskExecutor connection container_1553017480503_0158_01_77 because: TaskExecutor exceeded the idle timeout. 2019-04-03 13:48:48,902 INFO org.apache.flink.yarn.YarnTaskExecutorRunner - -Dlog.file=/data1/hadoop/yarn/log/application_1553017480503_0158/container_1553017480503_0158_01_59/taskmanager.log 2019-04-03 14:59:24,677 INFO org.apache.parquet.hadoop.InternalParquetRecordWriter - Flushing mem columnStore to file. allocated memory: 109479981 2019-04-03 15:00:05,696 INFO org.apache.parquet.hadoop.InternalParquetRecordWriter - mem size 135014409 > 134217728: flushing 1930100 records to disk. 2019-04-03 15:00:05,696 INFO org.apache.parquet.hadoop.InternalParquetRecordWriter - Flushing mem columnStore to file. allocated memory: 102677684 2019-04-03 15:00:08,671 ERROR org.apache.flink.runtime.operators.BatchTask - Error in task code: CHAIN Partition -> FlatMap org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException: Lost connection to task manager 'hadoop9/192.1
Re: [DISCUSS] FLIP-38 Support python language in flink TableAPI
Thanks for putting this proposal together. It would be nice, if you could share a few use case examples (maybe add them as section to the FLIP?). The reason I ask: The table API is immensely useful, but it isn't clear to me what value other language bindings provide without UDF support. With FLIP-38 it will be possible to write a program in Python, but not execute Python functions. Without UDF support, isn't it possible to achieve roughly the same with plain SQL? In which situation would I use the Python API? There was related discussion regarding UDF support in [1]. If the assumption is that such support will be added later, then I would like to circle back to the question why this cannot be built on top of Beam? It would be nice to clarify the bigger goal before embarking for the first milestone. I'm going to comment on other things in the doc. [1] https://lists.apache.org/thread.html/f6f8116b4b38b0b2d70ed45b990d6bb1bcb33611fde6fdf32ec0e840@%3Cdev.flink.apache.org%3E Thomas On Wed, Apr 3, 2019 at 12:35 PM Shuyi Chen wrote: > Thanks a lot for driving the FLIP, jincheng. The approach looks > good. Adding multi-lang support sounds a promising direction to expand the > footprint of Flink. Do we have plan for adding Golang support? As many > backend engineers nowadays are familiar with Go, but probably not Java as > much, adding Golang support would significantly reduce their friction to > use Flink. Also, do we have a design for multi-lang UDF support, and what's > timeline for adding DataStream API support? We would like to help and > contribute as well as we do have similar need internally at our company. > Thanks a lot. > > Shuyi > > On Tue, Apr 2, 2019 at 1:03 AM jincheng sun > wrote: > > > Hi All, > > As Xianda brought up in the previous email, There are a large number of > > data analysis users who want flink to support Python. At the Flink API > > level, we have DataStreamAPI/DataSetAPI/TableAPI&SQL, the Table API will > > become the first-class citizen. Table API is declarative and can be > > automatically optimized, which is mentioned in the Flink mid-term roadmap > > by Stephan. So we first considering supporting Python at the Table level > to > > cater to the current large number of analytics users. For further promote > > Python support in flink table level. Dian, Wei and I discussed offline a > > bit and came up with an initial features outline as follows: > > > > - Python TableAPI Interface > > Introduce a set of Python Table API interfaces, including interface > > definitions such as Table, TableEnvironment, TableConfig, etc. > > > > - Implementation Architecture > > We will offer two alternative architecture options, one for pure Python > > language support and one for extended multi-language design. > > > > - Job Submission > > Provide a way that can submit(local/remote) Python Table API jobs. > > > > - Python Shell > > Python Shell is to provide an interactive way for users to write and > > execute flink Python Table API jobs. > > > > > > The design document for FLIP-38 can be found here: > > > > > > > https://docs.google.com/document/d/1ybYt-0xWRMa1Yf5VsuqGRtOfJBz4p74ZmDxZYg3j_h8/edit?usp=sharing > > > > I am looking forward to your comments and feedback. > > > > Best, > > Jincheng > > >
[jira] [Created] (FLINK-12107) Use Proxy For DataDog Validation
Luka Jurukovski created FLINK-12107: --- Summary: Use Proxy For DataDog Validation Key: FLINK-12107 URL: https://issues.apache.org/jira/browse/FLINK-12107 Project: Flink Issue Type: Improvement Reporter: Luka Jurukovski Recently support for DataDog Metric Proxy was added, however validation for the api keys ignores the use of the proxy. There are circumstances in which proxying is used due to the fact that the service itself is not reachable directly. Currently the validation pings datadog using the api keys provided. https://app.datadoghq.com/api/v1/validate?api_key= -- This message was sent by Atlassian JIRA (v7.6.3#76005)
contributor permission
Hi, I want to contribute to Apache Flink. Would you please give me the contributor permission? My JIRA ID is shaozhipeng. Thank you very much.
[VOTE] Release 1.8.0, release candidate #5
Hi everyone, Please review and vote on the release candidate 5 for Flink 1.8.0, as follows: [ ] +1, Approve the release [ ] -1, Do not approve the release (please provide specific comments) The complete staging area is available for your review, which includes: * JIRA release notes [1], * the official Apache source release and binary convenience releases to be deployed to dist.apache.org [2], which are signed with the key with fingerprint F2A67A8047499BBB3908D17AA8F4FD97121D7293 [3], * all artifacts to be deployed to the Maven Central Repository [4], * source code tag "release-1.8.0-rc5" [5], * website pull request listing the new release [6] * website pull request adding announcement blog post [7]. The vote will be open for at least 72 hours. It is adopted by majority approval, with at least 3 PMC affirmative votes. Thanks, Aljoscha [1] https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12344274 [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.8.0-rc5/ [3] https://dist.apache.org/repos/dist/release/flink/KEYS [4] https://repository.apache.org/content/repositories/orgapacheflink-1216 [5] https://gitbox.apache.org/repos/asf?p=flink.git;a=tag;h=33943b15f9eca3fafbd88b9315cf9f0a083b6381 [6] https://github.com/apache/flink-web/pull/180 [7] https://github.com/apache/flink-web/pull/179 P.S. The difference to the previous RCs is small, you can fetch the tags and do a "git log release-1.8.0-rc1..release-1.8.0-rc5” to see the difference in commits. Its fixes for the issues that led to the cancellation of the previous RCs plus smaller fixes. Most verification/testing that was carried out should apply as is to this RC. Any functional verification that you did on previous RCs should therefore easily carry over to this one. This does add one new feature for the Kafka connector that was on master for quite a while but not on release-1.8 because of discussions. The NOTICE-binary file is updated to reflect the slight change in ordering; shaded-guava now appears at a different part in the dependency tree.
Re: [DISCUSS] FLIP-38 Support python language in flink TableAPI
Hi Jeff, thank you for your encouragement and valuable comments on the google doc. I saw that you participated in many pySpark discussions a long time ago, I am very grateful and look forward to your follow-up comments on pyFlink! Thanks, Jincheng Jeff Zhang 于2019年4月2日周二 下午10:53写道: > Thanks jincheng for driving this. Overall I agree with the approach, just > left a few comments for details. > > > > jincheng sun 于2019年4月2日周二 下午4:03写道: > > > Hi All, > > As Xianda brought up in the previous email, There are a large number of > > data analysis users who want flink to support Python. At the Flink API > > level, we have DataStreamAPI/DataSetAPI/TableAPI&SQL, the Table API will > > become the first-class citizen. Table API is declarative and can be > > automatically optimized, which is mentioned in the Flink mid-term roadmap > > by Stephan. So we first considering supporting Python at the Table level > to > > cater to the current large number of analytics users. For further promote > > Python support in flink table level. Dian, Wei and I discussed offline a > > bit and came up with an initial features outline as follows: > > > > - Python TableAPI Interface > > Introduce a set of Python Table API interfaces, including interface > > definitions such as Table, TableEnvironment, TableConfig, etc. > > > > - Implementation Architecture > > We will offer two alternative architecture options, one for pure Python > > language support and one for extended multi-language design. > > > > - Job Submission > > Provide a way that can submit(local/remote) Python Table API jobs. > > > > - Python Shell > > Python Shell is to provide an interactive way for users to write and > > execute flink Python Table API jobs. > > > > > > The design document for FLIP-38 can be found here: > > > > > > > https://docs.google.com/document/d/1ybYt-0xWRMa1Yf5VsuqGRtOfJBz4p74ZmDxZYg3j_h8/edit?usp=sharing > > > > I am looking forward to your comments and feedback. > > > > Best, > > Jincheng > > > > > -- > Best Regards > > Jeff Zhang >
Re: [DISCUSS] FLIP-38 Support python language in flink TableAPI
Hi Shuyi, Glad to see your feedback and port more requirements about multi-language! I think the Flink community is very much looking forward to more language support, of course, Golang should be in the future support list. Since the topic of supporting Python on Flink has been researched and discussed in the community for a long time, and I want to support Python in the Table API as the first stage, then other languages should be planed to support. but I do not think more about the detail about how/when support Golang. And very welcome to share more ideas on how to support Golang if you have more thoughts. :) Regarding UDF, we do have some ideas and design attempts. The related attempts to show the performance of python UDF are not optimistic. And there are also some problems with Python environment management should be considered. After we have more investigations and experiments, I will share the discussion with you in time. Perhaps after the first stage(Python TableAPI support), We will then discuss the detailed discussion of UDF support. I think the support of the DataStream API should be considered after supporting UDFs because DataStream is mostly supported by various functions. We plan to complete the first phase before the release of Flink-1.9, and start the UDF support after 1.9. Of course, I am very glad to hear that you want to contribute to the Flink multi-language support. I believe, nothing is impossible if more people interest in Python Table API with UDF support and more people want to contribute community more, UDF may be there when flink-1.9 release. :) Best, Jincheng Shuyi Chen 于2019年4月4日周四 上午3:35写道: > Thanks a lot for driving the FLIP, jincheng. The approach looks > good. Adding multi-lang support sounds a promising direction to expand the > footprint of Flink. Do we have plan for adding Golang support? As many > backend engineers nowadays are familiar with Go, but probably not Java as > much, adding Golang support would significantly reduce their friction to > use Flink. Also, do we have a design for multi-lang UDF support, and what's > timeline for adding DataStream API support? We would like to help and > contribute as well as we do have similar need internally at our company. > Thanks a lot. > > Shuyi > > On Tue, Apr 2, 2019 at 1:03 AM jincheng sun > wrote: > > > Hi All, > > As Xianda brought up in the previous email, There are a large number of > > data analysis users who want flink to support Python. At the Flink API > > level, we have DataStreamAPI/DataSetAPI/TableAPI&SQL, the Table API will > > become the first-class citizen. Table API is declarative and can be > > automatically optimized, which is mentioned in the Flink mid-term roadmap > > by Stephan. So we first considering supporting Python at the Table level > to > > cater to the current large number of analytics users. For further promote > > Python support in flink table level. Dian, Wei and I discussed offline a > > bit and came up with an initial features outline as follows: > > > > - Python TableAPI Interface > > Introduce a set of Python Table API interfaces, including interface > > definitions such as Table, TableEnvironment, TableConfig, etc. > > > > - Implementation Architecture > > We will offer two alternative architecture options, one for pure Python > > language support and one for extended multi-language design. > > > > - Job Submission > > Provide a way that can submit(local/remote) Python Table API jobs. > > > > - Python Shell > > Python Shell is to provide an interactive way for users to write and > > execute flink Python Table API jobs. > > > > > > The design document for FLIP-38 can be found here: > > > > > > > https://docs.google.com/document/d/1ybYt-0xWRMa1Yf5VsuqGRtOfJBz4p74ZmDxZYg3j_h8/edit?usp=sharing > > > > I am looking forward to your comments and feedback. > > > > Best, > > Jincheng > > >
[jira] [Created] (FLINK-12108) Simplify splitting expressions into projections, aggregations & windowProperties
Dawid Wysakowicz created FLINK-12108: Summary: Simplify splitting expressions into projections, aggregations & windowProperties Key: FLINK-12108 URL: https://issues.apache.org/jira/browse/FLINK-12108 Project: Flink Issue Type: Sub-task Components: Table SQL / API Reporter: Dawid Wysakowicz Assignee: Dawid Wysakowicz -- This message was sent by Atlassian JIRA (v7.6.3#76005)