[jira] [Created] (FLINK-12286) The default log4j.properties will overwrite the customized log4j property file.

2019-04-22 Thread Tang Yan (JIRA)
Tang Yan created FLINK-12286:


 Summary: The default log4j.properties will overwrite the 
customized log4j property file.
 Key: FLINK-12286
 URL: https://issues.apache.org/jira/browse/FLINK-12286
 Project: Flink
  Issue Type: Bug
  Components: flink-contrib
Affects Versions: 1.7.2
Reporter: Tang Yan


This is my run command:

bin/flink run -m yarn-cluster -yD 
env.java.opts="-Dlog4j.configuration=file:/mypath/to/log4j-flink.properties"  
./examples/batch/WordCount.jar  --input path1 --output path2

Result: The job still used the default log4j.properties in conf folder. 

>From the below logs, it seems the job launched customized configuration 
>firstly, and then the default property file just overwrite it.

2019-04-22 06:57:33,436 INFO 
org.apache.flink.runtime.entrypoint.ClusterEntrypoint - JVM Options: 2019-04-22 
06:57:33,436 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - 
-Xmx2304m 2019-04-22 06:57:33,436 INFO 
org.apache.flink.runtime.entrypoint.ClusterEntrypoint - 
-Dlog4j.configuration=file:/mypath/to/log4j-flink.properties 2019-04-22 
06:57:33,436 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - 
-Dlog.file=/data/yarn/container-logs/application_1555610668906_0067/container_e193_1555610668906_0067_01_01/jobmanager.log
 2019-04-22 06:57:33,436 INFO 
org.apache.flink.runtime.entrypoint.ClusterEntrypoint - 
-Dlog4j.configuration=file:log4j.properties 2019-04-22 06:57:33,436 INFO 
org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Program Arguments: 
(none) 2019



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-12287) A document error in "StreamElement.java"

2019-04-22 Thread YangFei (JIRA)
YangFei created FLINK-12287:
---

 Summary: A document error in "StreamElement.java"
 Key: FLINK-12287
 URL: https://issues.apache.org/jira/browse/FLINK-12287
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Operators
Affects Versions: 1.8.0, 1.7.2
Reporter: YangFei
Assignee: YangFei
 Fix For: 2.0.0


/**
 * Checks whether this element is a record.
 * @return True, if this element is a record, false otherwise.
 */
public final boolean isRecord() {
 return getClass() == StreamRecord.class;
}

/**
 * {color:#FF}Checks whether this element is a record.{color}
 * @return True, if this element is a record, false otherwise.
 */
public final boolean isLatencyMarker() {
 return getClass() == LatencyMarker.class;
}

I think the red line words is not the current doc for function isLatencyMarker

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-12288) Bump Calcite dependency to 1.19.0 in blink planner

2019-04-22 Thread Jark Wu (JIRA)
Jark Wu created FLINK-12288:
---

 Summary: Bump Calcite dependency to 1.19.0 in blink planner
 Key: FLINK-12288
 URL: https://issues.apache.org/jira/browse/FLINK-12288
 Project: Flink
  Issue Type: New Feature
  Components: Table SQL / Planner
Reporter: Jark Wu
Assignee: Jark Wu


Bump Calcite dependency to 1.19.0 in {{flink-table-planner-blink}} module.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Apply for contributor permission

2019-04-22 Thread 张军
Hi,
I want to contribute to Apache Flink.
Would you please give me the contributor permission?
My JIRA ID is zhangjun 
Thanks.

I want to apply Flink JIRA contributor

2019-04-22 Thread Armstrong
Hi,

I want to contribute to Apache Flink.
Would you please give me the contributor permission?
My JIRA ID is Armstrongya.


[jira] [Created] (FLINK-12289) Fix bugs and typos in Memory manager

2019-04-22 Thread Liya Fan (JIRA)
Liya Fan created FLINK-12289:


 Summary: Fix bugs and typos in Memory manager
 Key: FLINK-12289
 URL: https://issues.apache.org/jira/browse/FLINK-12289
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Operators
Reporter: Liya Fan
Assignee: Liya Fan


According to the JavaDoc,  MemoryManager.release method should throw an NPE if 
the input argument is null. 

In addition, there are some typos in class MemoryManager.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-12290) Fix the misleading exception message in SharedSlot class

2019-04-22 Thread Liya Fan (JIRA)
Liya Fan created FLINK-12290:


 Summary: Fix the misleading exception message in SharedSlot class
 Key: FLINK-12290
 URL: https://issues.apache.org/jira/browse/FLINK-12290
 Project: Flink
  Issue Type: Bug
Reporter: Liya Fan
Assignee: Liya Fan


The exception message in SharedSlot.releaseSlot is misleading. It says 
"SharedSlot is not empty and released". But the condition should be "SharedSlot 
is not released and empty."



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Flink Filesystems [FLINK-12115] - Review adding Azure FS support

2019-04-22 Thread Piyush Narang
Hi folks,

I put up this review a couple of weeks back and after an initial set of 
comments, I haven’t heard back from anyone 
(https://github.com/apache/flink/pull/8117
). Would really appreciate if someone familiar with the flink file systems code 
could take a look at this review request and provide feedback.

Thanks,

-- Piyush



Re: Support for ProtoBuf data format in StreamingFileSink

2019-04-22 Thread Kailash Dayanand
Friendly remainder. Any thoughts on this approach ?

On Tue, Apr 9, 2019 at 11:36 AM Kailash Dayanand  wrote:

> cc'ing few folks who are interested in this discussion.
>
> On Tue, Apr 9, 2019 at 11:35 AM Kailash Dayanand 
> wrote:
>
>> Hello,
>>
>> I am looking to contribute a ProtoParquetWriter support which can be used
>> in Bulk format for the StreamingFileSink api. There has been earlier
>> discussions on this in the user mailing list: https://goo.gl/ya2StL and
>> thought it would be a good addition to have.
>>
>> For implementation, looking at the current APIs present at
>> ProtoParquetWriter with the parguet project (http://tinyurl.com/y378be42),
>> it looks like there is some different in the interface between Avro and
>> Proto writes (ProtoParquetWriter does not have a builder class as well as
>> not interface with Outputfile). Due to this, I was looking at directly
>> extending the ParquetWriter within Flink to define the Builder static class
>> and have newer interfaces. This is needed as the bulk writer takes a
>> builder to crate the ParquetWriter in the bulkWriter.Factory. (
>> http://tinyurl.com/yyg9cn9b)
>>
>> Any thoughts if this is a reasonable approach?
>>
>> Thanks
>> Kailash
>>
>


Re: Support for ProtoBuf data format in StreamingFileSink

2019-04-22 Thread Jakob Homan
+1.  Sounds good to me.
-Jakob

On Mon, Apr 22, 2019 at 9:00 AM Kailash Dayanand
 wrote:
>
> Friendly remainder. Any thoughts on this approach ?
>
> On Tue, Apr 9, 2019 at 11:36 AM Kailash Dayanand  wrote:
>
> > cc'ing few folks who are interested in this discussion.
> >
> > On Tue, Apr 9, 2019 at 11:35 AM Kailash Dayanand 
> > wrote:
> >
> >> Hello,
> >>
> >> I am looking to contribute a ProtoParquetWriter support which can be used
> >> in Bulk format for the StreamingFileSink api. There has been earlier
> >> discussions on this in the user mailing list: https://goo.gl/ya2StL and
> >> thought it would be a good addition to have.
> >>
> >> For implementation, looking at the current APIs present at
> >> ProtoParquetWriter with the parguet project (http://tinyurl.com/y378be42),
> >> it looks like there is some different in the interface between Avro and
> >> Proto writes (ProtoParquetWriter does not have a builder class as well as
> >> not interface with Outputfile). Due to this, I was looking at directly
> >> extending the ParquetWriter within Flink to define the Builder static class
> >> and have newer interfaces. This is needed as the bulk writer takes a
> >> builder to crate the ParquetWriter in the bulkWriter.Factory. (
> >> http://tinyurl.com/yyg9cn9b)
> >>
> >> Any thoughts if this is a reasonable approach?
> >>
> >> Thanks
> >> Kailash
> >>
> >


Re: [DISCUSS] FLIP-38 Support python language in flink TableAPI

2019-04-22 Thread jincheng sun
Hi everyone,

Thank you for all of your feedback and comments in google doc!

I have updated the google doc and add the UDFs part. For a short summary:

  - Python TableAPI - Flink introduces a set of Python Table API Interfaces
which align with Flink Java Table API. It uses Py4j framework to
communicate between Python VM  and Java VM.
  - Python User-defined functions - IMO. Flink supports the communication
framework of UDFs, we will try to reuse the existing achievements of Beam
as much as possible, and do our best for this. The first step is
  to solve the above interface definition problem, which turns `
WindowedValue` into `T` in the FnDataService and BeamFnDataClient
interface definition, has been discussed in the Beam community.

The detail can be fonded here:
https://docs.google.com/document/d/1ybYt-0xWRMa1Yf5VsuqGRtOfJBz4p74ZmDxZYg3j_h8/edit?usp=sharing

So we can start the development of Table API without UDFs in Flink, and
work with the Beam community to promote the abstraction of Beam.

What do you think?

Regards,
Jincheng

jincheng sun  于2019年4月17日周三 下午4:01写道:

> Hi Stephan,
>
> Thanks for your suggestion and summarize. :)
>
>  ==> The FLIP should probably reflect the full goal rather than the
>> first implementation step only, this would make sure everyone understands
>> what the final goal of the effort is.
>
>
> I totally agree that we can implement the function in stages, but FLIP
> needs to reflect the full final goal. I agree with Thomas and you,  I will
> add the design of the UDF part later.
>
> Yes, you are right, currently, we only consider the `flink run` and
> `python-shell` as the job entry point. and we should add REST API for
> another entry point.
>
> It would be super cool if the Python API would work seamlessly with all
>> modes of starting Flink jobs.
>
>
> If my understand you correctly, support Python TableAPI in Kubernetes, we
> only need to increase (or improve the existing) REST API corresponding to
> the Python Table API, of course, it also may need to release Docker Image
> that supports Python, it will easily deploy Python TableAPI into
> Kubernetes.
>
> So, Finally, we support the following ways to submit Python TableAPI:
> - Python Shell - interactive development.
> - CLI - submit the job by `flink run`. e.g: deploy job into the yarn
> cluster.
> - REST - submit the job by REST API. e.g: deploy job into the kubernetes
> cluster.
>
> Please correct me if there are any incorrect understanding.
>
> Thanks,
> Jincheng
>
>
> Stephan Ewen  于2019年4月12日周五 上午12:22写道:
>
>> One more thought:
>>
>> The FLIP is very much centered on the CLI and it looks like it has mainly
>> batch jobs and session clusters in mind.
>>
>> In very many cases, especially in streaming cases, the CLI (or shell) is
>> not the entry point for a program.
>> See for example the use of Flink jobs on Kubernetes (Container Mode /
>> Entrypoint).
>>
>> It would be super cool if the Python API would work seamlessly with all
>> modes of starting Flink jobs.
>> That would make i available to all users.
>>
>> On Thu, Apr 11, 2019 at 5:34 PM Stephan Ewen  wrote:
>>
>> > Hi all!
>> >
>> > I think that all the opinions and ideas are not actually in conflict, so
>> > let me summarize what I understand is the proposal:
>> >
>> > *(1) Long-term goal: Full Python Table API with UDFs*
>> >
>> >  To break the implementation effort up into stages, the first step
>> > would be the API without UDFs.
>> >   Because of all the built-in functions in the Table API, this can
>> > already exist by itself, with some value, but ultimately is quite
>> limited
>> > without UDF support.
>> >
>> >  ==> The FLIP should probably reflect the full goal rather than the
>> > first implementation step only, this would make sure everyone
>> understands
>> > what the final goal of the effort is.
>> >
>> >
>> > *(2) Relationship to Beam Language Portability*
>> >
>> > Flink's own Python Table API and Beam-Python on Flink add different
>> value
>> > and are both attractive for different scenarios.
>> >
>> >   - Beam's Python API supports complex pipelines in a similar style as
>> the
>> > DataStream API. There is also the ecosystem of libraries built on top
>> that
>> > DSL, for example for machine learning.
>> >
>> >   - Flink's Python Table API builds mostly relational expressions, plus
>> > some UDFs. Most of the Python code never executes in Python, though. It
>> is
>> > geared at use cases similar to Flink's Table API.
>> >
>> > Both approaches mainly differ in how the streaming DAG is built from
>> > Python code and received by the JVM.
>> >
>> > In previous discussions, we concluded that for inter process data
>> exchange
>> > (JVM <> Python), we want to share code with Beam.
>> > That part is possibly the most crucial piece to getting performance out
>> of
>> > the Python DSL, so will benefit from sharing development, optimizations,
>> > etc.
>> >
>> > Best,
>> > Stephan
>> >
>> >
>> >
>> >
>> > On Fri, Apr 5, 2019 a

[jira] [Created] (FLINK-12291) Add benchmarks for the input processors

2019-04-22 Thread Haibo Sun (JIRA)
Haibo Sun created FLINK-12291:
-

 Summary: Add benchmarks for the input processors
 Key: FLINK-12291
 URL: https://issues.apache.org/jira/browse/FLINK-12291
 Project: Flink
  Issue Type: Task
  Components: Runtime / Operators
Reporter: Haibo Sun
Assignee: Haibo Sun


Adds benchmarks for `StreamTwoInputProcessor` and 
`StreamTwoInputSelectableProcessor` to ensure that 
StreamTwoInputSelectableProcessor's throughput is the same or the regression is 
acceptable in the case of constant `InputSelection.ALL`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-12292) Add benchmarks for the input processors

2019-04-22 Thread Haibo Sun (JIRA)
Haibo Sun created FLINK-12292:
-

 Summary: Add benchmarks for the input processors
 Key: FLINK-12292
 URL: https://issues.apache.org/jira/browse/FLINK-12292
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Operators
Reporter: Haibo Sun
Assignee: Haibo Sun


Adds benchmarks for `StreamTwoInputProcessor` and 
`StreamTwoInputSelectableProcessor` to ensure that 
StreamTwoInputSelectableProcessor's throughput is the same or the regression is 
acceptable in the case of constant `InputSelection.ALL`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-12293) Fix some comment typos in flink-streaming-java.

2019-04-22 Thread Ji Liu (JIRA)
Ji Liu created FLINK-12293:
--

 Summary: Fix some comment typos in flink-streaming-java.
 Key: FLINK-12293
 URL: https://issues.apache.org/jira/browse/FLINK-12293
 Project: Flink
  Issue Type: Improvement
Reporter: Ji Liu
Assignee: Ji Liu


Some comments typos in flink-streaming-java should be fixed. I will provide a 
PR.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-12294) kafka consumer, data locality

2019-04-22 Thread Sergey (JIRA)
Sergey created FLINK-12294:
--

 Summary: kafka consumer, data locality
 Key: FLINK-12294
 URL: https://issues.apache.org/jira/browse/FLINK-12294
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / Kafka, Runtime / Coordination
Reporter: Sergey


Additional flag (with default false value) controlling whether topic partitions 
already grouped by the key. Exclude unnecessary shuffle/resorting operation 
when this parameter set to true. As an example, say we have client's payment 
transaction in a kafka topic. We grouping by clientId (transaction with the 
same clientId goes to one kafka topic partition) and the task is to find max 
transaction per client in sliding windows. In terms of map\reduce there is no 
needs to shuffle data between all topic consumers, may be it`s worth to do 
within each consumer to gain some speedup due to increasing number of executors 
within each partition data. With N messages (in partition) instead of N*ln(N) 
(current realization with shuffle/resorting) it will be just N operations. For 
windows with thousands events - the tenfold gain of execution speed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-12295) use retract aggregate function instead of regular aggregate function for retractable aggregate in code gen

2019-04-22 Thread godfrey he (JIRA)
godfrey he created FLINK-12295:
--

 Summary: use retract aggregate function instead of regular 
aggregate function for retractable aggregate in code gen
 Key: FLINK-12295
 URL: https://issues.apache.org/jira/browse/FLINK-12295
 Project: Flink
  Issue Type: New Feature
  Components: Table SQL / Runtime
Reporter: godfrey he
Assignee: godfrey he


after {{FlinkRelMdModifiedMonotonicity}} introduced, an aggregate function 
whose result value is modified increasing or decreasing could ignore retraction 
message. We could choose regular aggregate function instead of retract 
aggregate function for those aggregate functions in code-gen. Currently, this 
requires the regular aggregate function must implements {{retractExpressions}} 
method and do not throw any exception. A better approach is the retractable 
aggregate operator does not call {{retractExpressions}} method for regular 
aggregate function.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-12296) Data loss silently in RocksDBStateBackend when more than one operator chained in a single task

2019-04-22 Thread Congxian Qiu(klion26) (JIRA)
Congxian Qiu(klion26) created FLINK-12296:
-

 Summary: Data loss silently in RocksDBStateBackend when more than 
one operator chained in a single task 
 Key: FLINK-12296
 URL: https://issues.apache.org/jira/browse/FLINK-12296
 Project: Flink
  Issue Type: Test
  Components: Runtime / State Backends
Reporter: Congxian Qiu(klion26)
Assignee: Congxian Qiu(klion26)


As the mail list said[1], there may be a problem when more than one operator 
chained in a single task, and all the operators have states, this will be data 
loss silently.

 

[1] 
https://app.smartmailcloud.com/web-share/MDkE4DArUT2eoSv86xq772I1HDgMNTVhLEmsnbQ7



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-12297) We should clean the closure for OutputTags

2019-04-22 Thread Dawid Wysakowicz (JIRA)
Dawid Wysakowicz created FLINK-12297:


 Summary: We should clean the closure for OutputTags
 Key: FLINK-12297
 URL: https://issues.apache.org/jira/browse/FLINK-12297
 Project: Flink
  Issue Type: Bug
  Components: Library / CEP
Affects Versions: 1.8.0
Reporter: Dawid Wysakowicz
 Fix For: 1.9.0, 1.8.1


Right now we do not invoke closure cleaner on output tags. Therefore such code:

{code}
@Test
public void testFlatSelectSerialization() throws Exception {
StreamExecutionEnvironment env = 
StreamExecutionEnvironment.getExecutionEnvironment();
DataStreamSource elements = env.fromElements(1, 2, 3);
OutputTag outputTag = new OutputTag("AAA") {};
CEP.pattern(elements, Pattern.begin("A")).flatSelect(
outputTag,
new PatternFlatTimeoutFunction() {
@Override
public void timeout(
Map> pattern,
long timeoutTimestamp,
Collector out) throws 
Exception {

}
},
new PatternFlatSelectFunction() {
@Override
public void flatSelect(Map> pattern, Collector out) throws Exception {

}
}
);

env.execute();
}
{code}

will fail with {{The implementation of the PatternFlatSelectAdapter is not 
serializable. }} exception



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)