[jira] [Created] (FLINK-12286) The default log4j.properties will overwrite the customized log4j property file.
Tang Yan created FLINK-12286: Summary: The default log4j.properties will overwrite the customized log4j property file. Key: FLINK-12286 URL: https://issues.apache.org/jira/browse/FLINK-12286 Project: Flink Issue Type: Bug Components: flink-contrib Affects Versions: 1.7.2 Reporter: Tang Yan This is my run command: bin/flink run -m yarn-cluster -yD env.java.opts="-Dlog4j.configuration=file:/mypath/to/log4j-flink.properties" ./examples/batch/WordCount.jar --input path1 --output path2 Result: The job still used the default log4j.properties in conf folder. >From the below logs, it seems the job launched customized configuration >firstly, and then the default property file just overwrite it. 2019-04-22 06:57:33,436 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - JVM Options: 2019-04-22 06:57:33,436 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Xmx2304m 2019-04-22 06:57:33,436 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Dlog4j.configuration=file:/mypath/to/log4j-flink.properties 2019-04-22 06:57:33,436 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Dlog.file=/data/yarn/container-logs/application_1555610668906_0067/container_e193_1555610668906_0067_01_01/jobmanager.log 2019-04-22 06:57:33,436 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Dlog4j.configuration=file:log4j.properties 2019-04-22 06:57:33,436 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Program Arguments: (none) 2019 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-12287) A document error in "StreamElement.java"
YangFei created FLINK-12287: --- Summary: A document error in "StreamElement.java" Key: FLINK-12287 URL: https://issues.apache.org/jira/browse/FLINK-12287 Project: Flink Issue Type: Improvement Components: Runtime / Operators Affects Versions: 1.8.0, 1.7.2 Reporter: YangFei Assignee: YangFei Fix For: 2.0.0 /** * Checks whether this element is a record. * @return True, if this element is a record, false otherwise. */ public final boolean isRecord() { return getClass() == StreamRecord.class; } /** * {color:#FF}Checks whether this element is a record.{color} * @return True, if this element is a record, false otherwise. */ public final boolean isLatencyMarker() { return getClass() == LatencyMarker.class; } I think the red line words is not the current doc for function isLatencyMarker -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-12288) Bump Calcite dependency to 1.19.0 in blink planner
Jark Wu created FLINK-12288: --- Summary: Bump Calcite dependency to 1.19.0 in blink planner Key: FLINK-12288 URL: https://issues.apache.org/jira/browse/FLINK-12288 Project: Flink Issue Type: New Feature Components: Table SQL / Planner Reporter: Jark Wu Assignee: Jark Wu Bump Calcite dependency to 1.19.0 in {{flink-table-planner-blink}} module. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Apply for contributor permission
Hi, I want to contribute to Apache Flink. Would you please give me the contributor permission? My JIRA ID is zhangjun Thanks.
I want to apply Flink JIRA contributor
Hi, I want to contribute to Apache Flink. Would you please give me the contributor permission? My JIRA ID is Armstrongya.
[jira] [Created] (FLINK-12289) Fix bugs and typos in Memory manager
Liya Fan created FLINK-12289: Summary: Fix bugs and typos in Memory manager Key: FLINK-12289 URL: https://issues.apache.org/jira/browse/FLINK-12289 Project: Flink Issue Type: Bug Components: Runtime / Operators Reporter: Liya Fan Assignee: Liya Fan According to the JavaDoc, MemoryManager.release method should throw an NPE if the input argument is null. In addition, there are some typos in class MemoryManager. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-12290) Fix the misleading exception message in SharedSlot class
Liya Fan created FLINK-12290: Summary: Fix the misleading exception message in SharedSlot class Key: FLINK-12290 URL: https://issues.apache.org/jira/browse/FLINK-12290 Project: Flink Issue Type: Bug Reporter: Liya Fan Assignee: Liya Fan The exception message in SharedSlot.releaseSlot is misleading. It says "SharedSlot is not empty and released". But the condition should be "SharedSlot is not released and empty." -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Flink Filesystems [FLINK-12115] - Review adding Azure FS support
Hi folks, I put up this review a couple of weeks back and after an initial set of comments, I haven’t heard back from anyone (https://github.com/apache/flink/pull/8117 ). Would really appreciate if someone familiar with the flink file systems code could take a look at this review request and provide feedback. Thanks, -- Piyush
Re: Support for ProtoBuf data format in StreamingFileSink
Friendly remainder. Any thoughts on this approach ? On Tue, Apr 9, 2019 at 11:36 AM Kailash Dayanand wrote: > cc'ing few folks who are interested in this discussion. > > On Tue, Apr 9, 2019 at 11:35 AM Kailash Dayanand > wrote: > >> Hello, >> >> I am looking to contribute a ProtoParquetWriter support which can be used >> in Bulk format for the StreamingFileSink api. There has been earlier >> discussions on this in the user mailing list: https://goo.gl/ya2StL and >> thought it would be a good addition to have. >> >> For implementation, looking at the current APIs present at >> ProtoParquetWriter with the parguet project (http://tinyurl.com/y378be42), >> it looks like there is some different in the interface between Avro and >> Proto writes (ProtoParquetWriter does not have a builder class as well as >> not interface with Outputfile). Due to this, I was looking at directly >> extending the ParquetWriter within Flink to define the Builder static class >> and have newer interfaces. This is needed as the bulk writer takes a >> builder to crate the ParquetWriter in the bulkWriter.Factory. ( >> http://tinyurl.com/yyg9cn9b) >> >> Any thoughts if this is a reasonable approach? >> >> Thanks >> Kailash >> >
Re: Support for ProtoBuf data format in StreamingFileSink
+1. Sounds good to me. -Jakob On Mon, Apr 22, 2019 at 9:00 AM Kailash Dayanand wrote: > > Friendly remainder. Any thoughts on this approach ? > > On Tue, Apr 9, 2019 at 11:36 AM Kailash Dayanand wrote: > > > cc'ing few folks who are interested in this discussion. > > > > On Tue, Apr 9, 2019 at 11:35 AM Kailash Dayanand > > wrote: > > > >> Hello, > >> > >> I am looking to contribute a ProtoParquetWriter support which can be used > >> in Bulk format for the StreamingFileSink api. There has been earlier > >> discussions on this in the user mailing list: https://goo.gl/ya2StL and > >> thought it would be a good addition to have. > >> > >> For implementation, looking at the current APIs present at > >> ProtoParquetWriter with the parguet project (http://tinyurl.com/y378be42), > >> it looks like there is some different in the interface between Avro and > >> Proto writes (ProtoParquetWriter does not have a builder class as well as > >> not interface with Outputfile). Due to this, I was looking at directly > >> extending the ParquetWriter within Flink to define the Builder static class > >> and have newer interfaces. This is needed as the bulk writer takes a > >> builder to crate the ParquetWriter in the bulkWriter.Factory. ( > >> http://tinyurl.com/yyg9cn9b) > >> > >> Any thoughts if this is a reasonable approach? > >> > >> Thanks > >> Kailash > >> > >
Re: [DISCUSS] FLIP-38 Support python language in flink TableAPI
Hi everyone, Thank you for all of your feedback and comments in google doc! I have updated the google doc and add the UDFs part. For a short summary: - Python TableAPI - Flink introduces a set of Python Table API Interfaces which align with Flink Java Table API. It uses Py4j framework to communicate between Python VM and Java VM. - Python User-defined functions - IMO. Flink supports the communication framework of UDFs, we will try to reuse the existing achievements of Beam as much as possible, and do our best for this. The first step is to solve the above interface definition problem, which turns ` WindowedValue` into `T` in the FnDataService and BeamFnDataClient interface definition, has been discussed in the Beam community. The detail can be fonded here: https://docs.google.com/document/d/1ybYt-0xWRMa1Yf5VsuqGRtOfJBz4p74ZmDxZYg3j_h8/edit?usp=sharing So we can start the development of Table API without UDFs in Flink, and work with the Beam community to promote the abstraction of Beam. What do you think? Regards, Jincheng jincheng sun 于2019年4月17日周三 下午4:01写道: > Hi Stephan, > > Thanks for your suggestion and summarize. :) > > ==> The FLIP should probably reflect the full goal rather than the >> first implementation step only, this would make sure everyone understands >> what the final goal of the effort is. > > > I totally agree that we can implement the function in stages, but FLIP > needs to reflect the full final goal. I agree with Thomas and you, I will > add the design of the UDF part later. > > Yes, you are right, currently, we only consider the `flink run` and > `python-shell` as the job entry point. and we should add REST API for > another entry point. > > It would be super cool if the Python API would work seamlessly with all >> modes of starting Flink jobs. > > > If my understand you correctly, support Python TableAPI in Kubernetes, we > only need to increase (or improve the existing) REST API corresponding to > the Python Table API, of course, it also may need to release Docker Image > that supports Python, it will easily deploy Python TableAPI into > Kubernetes. > > So, Finally, we support the following ways to submit Python TableAPI: > - Python Shell - interactive development. > - CLI - submit the job by `flink run`. e.g: deploy job into the yarn > cluster. > - REST - submit the job by REST API. e.g: deploy job into the kubernetes > cluster. > > Please correct me if there are any incorrect understanding. > > Thanks, > Jincheng > > > Stephan Ewen 于2019年4月12日周五 上午12:22写道: > >> One more thought: >> >> The FLIP is very much centered on the CLI and it looks like it has mainly >> batch jobs and session clusters in mind. >> >> In very many cases, especially in streaming cases, the CLI (or shell) is >> not the entry point for a program. >> See for example the use of Flink jobs on Kubernetes (Container Mode / >> Entrypoint). >> >> It would be super cool if the Python API would work seamlessly with all >> modes of starting Flink jobs. >> That would make i available to all users. >> >> On Thu, Apr 11, 2019 at 5:34 PM Stephan Ewen wrote: >> >> > Hi all! >> > >> > I think that all the opinions and ideas are not actually in conflict, so >> > let me summarize what I understand is the proposal: >> > >> > *(1) Long-term goal: Full Python Table API with UDFs* >> > >> > To break the implementation effort up into stages, the first step >> > would be the API without UDFs. >> > Because of all the built-in functions in the Table API, this can >> > already exist by itself, with some value, but ultimately is quite >> limited >> > without UDF support. >> > >> > ==> The FLIP should probably reflect the full goal rather than the >> > first implementation step only, this would make sure everyone >> understands >> > what the final goal of the effort is. >> > >> > >> > *(2) Relationship to Beam Language Portability* >> > >> > Flink's own Python Table API and Beam-Python on Flink add different >> value >> > and are both attractive for different scenarios. >> > >> > - Beam's Python API supports complex pipelines in a similar style as >> the >> > DataStream API. There is also the ecosystem of libraries built on top >> that >> > DSL, for example for machine learning. >> > >> > - Flink's Python Table API builds mostly relational expressions, plus >> > some UDFs. Most of the Python code never executes in Python, though. It >> is >> > geared at use cases similar to Flink's Table API. >> > >> > Both approaches mainly differ in how the streaming DAG is built from >> > Python code and received by the JVM. >> > >> > In previous discussions, we concluded that for inter process data >> exchange >> > (JVM <> Python), we want to share code with Beam. >> > That part is possibly the most crucial piece to getting performance out >> of >> > the Python DSL, so will benefit from sharing development, optimizations, >> > etc. >> > >> > Best, >> > Stephan >> > >> > >> > >> > >> > On Fri, Apr 5, 2019 a
[jira] [Created] (FLINK-12291) Add benchmarks for the input processors
Haibo Sun created FLINK-12291: - Summary: Add benchmarks for the input processors Key: FLINK-12291 URL: https://issues.apache.org/jira/browse/FLINK-12291 Project: Flink Issue Type: Task Components: Runtime / Operators Reporter: Haibo Sun Assignee: Haibo Sun Adds benchmarks for `StreamTwoInputProcessor` and `StreamTwoInputSelectableProcessor` to ensure that StreamTwoInputSelectableProcessor's throughput is the same or the regression is acceptable in the case of constant `InputSelection.ALL`. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-12292) Add benchmarks for the input processors
Haibo Sun created FLINK-12292: - Summary: Add benchmarks for the input processors Key: FLINK-12292 URL: https://issues.apache.org/jira/browse/FLINK-12292 Project: Flink Issue Type: Sub-task Components: Runtime / Operators Reporter: Haibo Sun Assignee: Haibo Sun Adds benchmarks for `StreamTwoInputProcessor` and `StreamTwoInputSelectableProcessor` to ensure that StreamTwoInputSelectableProcessor's throughput is the same or the regression is acceptable in the case of constant `InputSelection.ALL`. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-12293) Fix some comment typos in flink-streaming-java.
Ji Liu created FLINK-12293: -- Summary: Fix some comment typos in flink-streaming-java. Key: FLINK-12293 URL: https://issues.apache.org/jira/browse/FLINK-12293 Project: Flink Issue Type: Improvement Reporter: Ji Liu Assignee: Ji Liu Some comments typos in flink-streaming-java should be fixed. I will provide a PR. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-12294) kafka consumer, data locality
Sergey created FLINK-12294: -- Summary: kafka consumer, data locality Key: FLINK-12294 URL: https://issues.apache.org/jira/browse/FLINK-12294 Project: Flink Issue Type: Improvement Components: Connectors / Kafka, Runtime / Coordination Reporter: Sergey Additional flag (with default false value) controlling whether topic partitions already grouped by the key. Exclude unnecessary shuffle/resorting operation when this parameter set to true. As an example, say we have client's payment transaction in a kafka topic. We grouping by clientId (transaction with the same clientId goes to one kafka topic partition) and the task is to find max transaction per client in sliding windows. In terms of map\reduce there is no needs to shuffle data between all topic consumers, may be it`s worth to do within each consumer to gain some speedup due to increasing number of executors within each partition data. With N messages (in partition) instead of N*ln(N) (current realization with shuffle/resorting) it will be just N operations. For windows with thousands events - the tenfold gain of execution speed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-12295) use retract aggregate function instead of regular aggregate function for retractable aggregate in code gen
godfrey he created FLINK-12295: -- Summary: use retract aggregate function instead of regular aggregate function for retractable aggregate in code gen Key: FLINK-12295 URL: https://issues.apache.org/jira/browse/FLINK-12295 Project: Flink Issue Type: New Feature Components: Table SQL / Runtime Reporter: godfrey he Assignee: godfrey he after {{FlinkRelMdModifiedMonotonicity}} introduced, an aggregate function whose result value is modified increasing or decreasing could ignore retraction message. We could choose regular aggregate function instead of retract aggregate function for those aggregate functions in code-gen. Currently, this requires the regular aggregate function must implements {{retractExpressions}} method and do not throw any exception. A better approach is the retractable aggregate operator does not call {{retractExpressions}} method for regular aggregate function. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-12296) Data loss silently in RocksDBStateBackend when more than one operator chained in a single task
Congxian Qiu(klion26) created FLINK-12296: - Summary: Data loss silently in RocksDBStateBackend when more than one operator chained in a single task Key: FLINK-12296 URL: https://issues.apache.org/jira/browse/FLINK-12296 Project: Flink Issue Type: Test Components: Runtime / State Backends Reporter: Congxian Qiu(klion26) Assignee: Congxian Qiu(klion26) As the mail list said[1], there may be a problem when more than one operator chained in a single task, and all the operators have states, this will be data loss silently. [1] https://app.smartmailcloud.com/web-share/MDkE4DArUT2eoSv86xq772I1HDgMNTVhLEmsnbQ7 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-12297) We should clean the closure for OutputTags
Dawid Wysakowicz created FLINK-12297: Summary: We should clean the closure for OutputTags Key: FLINK-12297 URL: https://issues.apache.org/jira/browse/FLINK-12297 Project: Flink Issue Type: Bug Components: Library / CEP Affects Versions: 1.8.0 Reporter: Dawid Wysakowicz Fix For: 1.9.0, 1.8.1 Right now we do not invoke closure cleaner on output tags. Therefore such code: {code} @Test public void testFlatSelectSerialization() throws Exception { StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); DataStreamSource elements = env.fromElements(1, 2, 3); OutputTag outputTag = new OutputTag("AAA") {}; CEP.pattern(elements, Pattern.begin("A")).flatSelect( outputTag, new PatternFlatTimeoutFunction() { @Override public void timeout( Map> pattern, long timeoutTimestamp, Collector out) throws Exception { } }, new PatternFlatSelectFunction() { @Override public void flatSelect(Map> pattern, Collector out) throws Exception { } } ); env.execute(); } {code} will fail with {{The implementation of the PatternFlatSelectAdapter is not serializable. }} exception -- This message was sent by Atlassian JIRA (v7.6.3#76005)