Re: Support local aggregate push down for Blink batch planner

2021-01-05 Thread Jark Wu
Thanks for the update. The proposal looks good to me now. Best, Jark On Tue, 5 Jan 2021 at 14:44, Jingsong Li wrote: > Thanks for your proposal! Sebastian. > > +1 for SupportsAggregatePushDown. The above wonderful discussion has > solved many of my concerns. > > ## Semantic problems > > We may

Re: [DISCUSS] FLIP-147: Support Checkpoints After Tasks Finished

2021-01-05 Thread Arvid Heise
Hi Yun, 1. I'd think that this is an orthogonal issue, which I'd solve separately. My gut feeling says that this is something we should only address for new sinks where we decouple the semantics of commits and checkpoints anyways. @Aljoscha Krettek any idea on this one? 2. I'm not sure I get it

Re: [DISCUSS] Allow streaming operators to use managed memory

2021-01-05 Thread Till Rohrmann
+1 for Jark's and Xintong's proposal. Would the default weight for OPERATOR and STATE_BACKEND be the same value? Cheers, Till On Tue, Jan 5, 2021 at 6:39 AM Jingsong Li wrote: > +1 for allowing streaming operators to use managed memory. > > The memory use of streams requires some hierarchy, an

[PSA] Configure "Save Actions" only for Java files

2021-01-05 Thread Aljoscha Krettek
If you're using "Save Actions" to auto-format your Java code, as recommended in [1], you should add a regex in the settings to make sure that this only formats Java code. Otherwise you will get weird results when IntelliJ also formats XML, Markdown or Scala files for you. Best, Aljoscha [1]

Re: [PSA] Configure "Save Actions" only for Java files

2021-01-05 Thread Till Rohrmann
This is very helpful. Thanks a lot Aljoscha! Cheers, Till On Tue, Jan 5, 2021 at 10:59 AM Aljoscha Krettek wrote: > If you're using "Save Actions" to auto-format your Java code, as > recommended in [1], you should add a regex in the settings to make sure > that this only formats Java code. Othe

Re: [DISCUSS] Allow streaming operators to use managed memory

2021-01-05 Thread Xintong Song
> > Would the default weight for OPERATOR and STATE_BACKEND be the same value? > I would say yes, to align with previous behaviors. Thank you~ Xintong Song On Tue, Jan 5, 2021 at 5:51 PM Till Rohrmann wrote: > +1 for Jark's and Xintong's proposal. > > Would the default weight for OPERATOR

[CVE-2020-17519] Apache Flink directory traversal attack: reading remote files through the REST API

2021-01-05 Thread Robert Metzger
CVE-2020-17519: Apache Flink directory traversal attack: reading remote files through the REST API Vendor: The Apache Software Foundation Versions Affected: 1.11.0, 1.11.1, 1.11.2 Description: A change introduced in Apache Flink 1.11.0 (and released in 1.11.1 and 1.11.2 as well) allows attackers

[CVE-2020-17518] Apache Flink directory traversal attack: remote file writing through the REST API

2021-01-05 Thread Robert Metzger
CVE-2020-17518: Apache Flink directory traversal attack: remote file writing through the REST API Vendor: The Apache Software Foundation Versions Affected: 1.5.1 to 1.11.2 Description: Flink 1.5.1 introduced a REST handler that allows you to write an uploaded file to an arbitrary location on the

[jira] [Created] (FLINK-20850) Analyze whether CoLocationConstraints and CoLocationGroup can be removed

2021-01-05 Thread Matthias (Jira)
Matthias created FLINK-20850: Summary: Analyze whether CoLocationConstraints and CoLocationGroup can be removed Key: FLINK-20850 URL: https://issues.apache.org/jira/browse/FLINK-20850 Project: Flink

Apache Pinot Sink

2021-01-05 Thread Poerschke, Mats
Hi all, we want to contribute a sink connector for Apache Pinot. The following briefly describes the planned control flow. Please feel free to comment on any of its aspects. Background Apache Pinot is a large-scale real-time data ingestion engine working on data segments internally. The contro

Re: Apache Pinot Sink

2021-01-05 Thread Poerschke, Mats
Just as a short addition: We plan to contribute the sink to Apache Bahir. Best regards Mats Pörschke > On 5. Jan 2021, at 13:21, Poerschke, Mats > wrote: > > Hi all, > > we want to contribute a sink connector for Apache Pinot. The following > briefly describes the planned control flow. Pleas

[jira] [Created] (FLINK-20851) flink datagen produce NULL value

2021-01-05 Thread appleyuchi (Jira)
appleyuchi created FLINK-20851: -- Summary: flink datagen produce NULL value Key: FLINK-20851 URL: https://issues.apache.org/jira/browse/FLINK-20851 Project: Flink Issue Type: Bug Compon

[jira] [Created] (FLINK-20852) Enrich back pressure stats per subtask in the WebUI

2021-01-05 Thread Piotr Nowojski (Jira)
Piotr Nowojski created FLINK-20852: -- Summary: Enrich back pressure stats per subtask in the WebUI Key: FLINK-20852 URL: https://issues.apache.org/jira/browse/FLINK-20852 Project: Flink Issue

[jira] [Created] (FLINK-20853) Add reader schema null check for AvroDeserializationSchema when recordClazz is GenericRecord

2021-01-05 Thread hailong wang (Jira)
hailong wang created FLINK-20853: Summary: Add reader schema null check for AvroDeserializationSchema when recordClazz is GenericRecord Key: FLINK-20853 URL: https://issues.apache.org/jira/browse/FLINK-20853

Re: [DISCUSS] FLIP-147: Support Checkpoints After Tasks Finished

2021-01-05 Thread Aljoscha Krettek
On 2021/01/05 10:16, Arvid Heise wrote: 1. I'd think that this is an orthogonal issue, which I'd solve separately. My gut feeling says that this is something we should only address for new sinks where we decouple the semantics of commits and checkpoints anyways. @Aljoscha Krettek any idea on thi

Re: [DISCUSS] FLIP-147: Support Checkpoints After Tasks Finished

2021-01-05 Thread Yun Gao
Hi Avrid, Very thanks for the feedbacks! For the second issue, sorry I think I might not make it very clear, I'm initially thinking the case that for example for a job with graph A -> B -> C, when we compute which tasks to trigger, A is still running, so we trigger A to

Re: [DISCUSS] Drop Scala 2.11

2021-01-05 Thread Aljoscha Krettek
There is some new enthusiasm for bringing Scala 2.13 support to Flink: https://issues.apache.org/jira/browse/FLINK-13414. One of the assumed prerequisites for this is dropping support for Scala 2.11 because it will be too hard (impossible) to try and support three Scala versions at the same ti

[jira] [Created] (FLINK-20854) Introduce BytesMultiMap to support buffering records

2021-01-05 Thread Jark Wu (Jira)
Jark Wu created FLINK-20854: --- Summary: Introduce BytesMultiMap to support buffering records Key: FLINK-20854 URL: https://issues.apache.org/jira/browse/FLINK-20854 Project: Flink Issue Type: Sub-ta

Re: [DISCUSS] Drop Scala 2.11

2021-01-05 Thread Jeff Zhang
Glad to see someone in community would like to drive this effort. If scala 2.13 can do whatever scala 2.11 can do in flink (such as support scala-shell, scala lambda udf and etc), then I would be 100% support of dropping scala 2.11. Aljoscha Krettek 于2021年1月5日周二 下午11:01写道: > There is some new en

Re: [DISCUSS] FLIP-147: Support Checkpoints After Tasks Finished

2021-01-05 Thread Yun Gao
Hi Aljoscha, Very thanks for the feedbacks! For the second issue, I'm indeed thinking the race condition between deciding to trigger and operator get finished. And for this point, > One thought here is this: will there ever be intermediate operators that > should be run

Re: [DISCUSS] FLIP-147: Support Checkpoints After Tasks Finished

2021-01-05 Thread Arvid Heise
For 2) the race condition, I was more thinking of still injecting the barrier at the source in all cases, but having some kind of short-cut to immediately execute the RPC inside the respective taskmanager. However, that may prove hard in case of dynamic scale-ins. Nevertheless, because of this race

Re: Support local aggregate push down for Blink batch planner

2021-01-05 Thread Sebastian Liu
Hi Jinsong, Thx a lot for your suggestion. These points really need to be clear in the proposal. For the semantic problem, I think the main point is the different returned data types for the target aggregate function and the row format returned by the underlying storage. That's why we provide the

[jira] [Created] (FLINK-20855) Calculating numBuckets exceeds the maximum value of int and got a negative number

2021-01-05 Thread JieFang.He (Jira)
JieFang.He created FLINK-20855: -- Summary: Calculating numBuckets exceeds the maximum value of int and got a negative number Key: FLINK-20855 URL: https://issues.apache.org/jira/browse/FLINK-20855 Project

Re: Support local aggregate push down for Blink batch planner

2021-01-05 Thread Jingsong Li
Hi Sebastian, Well, I mean: `boolean applyAggregates(int[] groupingFields, List aggregateExpressions, DataType producedDataType);` VS ``` boolean applyAggregates(Aggregation agg); interface Aggregation { int[] groupingFields(); List aggregateExpressions(); DataType producedDataType(); } ``

[jira] [Created] (FLINK-20856) Separate the implementation of stream window aggregate nodes

2021-01-05 Thread godfrey he (Jira)
godfrey he created FLINK-20856: -- Summary: Separate the implementation of stream window aggregate nodes Key: FLINK-20856 URL: https://issues.apache.org/jira/browse/FLINK-20856 Project: Flink Iss

[jira] [Created] (FLINK-20858) Port StreamExecPythonGroupWindowAggregate and BatchExecPythonGroupWindowAggregate to Java

2021-01-05 Thread godfrey he (Jira)
godfrey he created FLINK-20858: -- Summary: Port StreamExecPythonGroupWindowAggregate and BatchExecPythonGroupWindowAggregate to Java Key: FLINK-20858 URL: https://issues.apache.org/jira/browse/FLINK-20858

[jira] [Created] (FLINK-20857) Separate the implementation of batch window aggregate nodes

2021-01-05 Thread godfrey he (Jira)
godfrey he created FLINK-20857: -- Summary: Separate the implementation of batch window aggregate nodes Key: FLINK-20857 URL: https://issues.apache.org/jira/browse/FLINK-20857 Project: Flink Issu

Re: Support local aggregate push down for Blink batch planner

2021-01-05 Thread Jark Wu
I think this may be over designed. We should have confidence in the interface we design, the interface should be stable. Wrapping things in a big context has a cost of losing user convenience. Foremost, we don't see any parameters to add in the future. Do you know any potential parameters? Best, J

[jira] [Created] (FLINK-20859) java.lang.NoClassDefFoundError: org/apache/parquet/avro/AvroParquetWriter

2021-01-05 Thread jack sun (Jira)
jack sun created FLINK-20859: Summary: java.lang.NoClassDefFoundError: org/apache/parquet/avro/AvroParquetWriter Key: FLINK-20859 URL: https://issues.apache.org/jira/browse/FLINK-20859 Project: Flink

Re: [DISCUSS] FLIP-155: Introduce a few convenient operations in Table API

2021-01-05 Thread Dian Fu
Hi all, I have updated the FLIP about temporal join, sql hints and window TVF. Regards, Dian > 在 2021年1月5日,上午11:58,Dian Fu 写道: > > Thanks a lot for your comments! > > Regarding to Python Table API examples: I thought it should be > straightforward about how to use these operations in Python

[VOTE] FLIP-153: Support state access in Python DataStream API

2021-01-05 Thread Shuiqiang Chen
Hi devs, The discussion of the FLIP-153 [1] seems has reached a consensus through the mailing thread [2]. I would like to start a vote for it. The vote will be opened until 11th January (72h), unless there is an objection or no enough votes. Best, Shuiqiang [1]: https://cwiki.apache.org/conflue

Re: Support local aggregate push down for Blink batch planner

2021-01-05 Thread Jingsong Li
Hi Jark, I don't want to limit this interface to LocalAgg Push down. Actually, sometimes, we can push whole aggregation to source too. So, this rule can do something more advanced. For example, we can push down group sets to source too, for the SQL: "GROUP BY GROUPING SETS (f1, f2)". Then, we nee

Re: [VOTE] FLIP-153: Support state access in Python DataStream API

2021-01-05 Thread Dian Fu
+1 (binding) > 在 2021年1月6日,下午1:12,Shuiqiang Chen 写道: > > Hi devs, > > The discussion of the FLIP-153 [1] seems has reached a consensus through > the mailing thread [2]. I would like to start a vote for it. > > The vote will be opened until 11th January (72h), unless there is an > objection or

[jira] [Created] (FLINK-20860) Allow streaming operators to use managed memory

2021-01-05 Thread Jark Wu (Jira)
Jark Wu created FLINK-20860: --- Summary: Allow streaming operators to use managed memory Key: FLINK-20860 URL: https://issues.apache.org/jira/browse/FLINK-20860 Project: Flink Issue Type: Sub-task

Re: [DISCUSS] Allow streaming operators to use managed memory

2021-01-05 Thread Jark Wu
Thanks all for the discussion. I have created an issue FLINK-20860 [1] to support this. In conclusion, we will extend the configuration `taskmanager.memory.managed.consumer-weights` to have 2 more consumer kinds: OPERATOR and STATE_BACKEND, the available consumer kinds will be : * `OPERATOR` for

Re: Support local aggregate push down for Blink batch planner

2021-01-05 Thread Jingsong Li
Hi, I'm also curious about aggregate with filter (COUNT(1) FILTER(WHERE d > 1)). Can we push it down? I'm not sure that a single call expression can express it, and how we should embody it and convey it to users. Best, Jingsong On Wed, Jan 6, 2021 at 1:36 PM Jingsong Li wrote: > Hi Jark, > > I

Task scheduling of Flink

2021-01-05 Thread penguin.
Hello! Do you know how to modify the task scheduling method of Flink?

Re: [VOTE] FLIP-153: Support state access in Python DataStream API

2021-01-05 Thread Xingbo Huang
+1 (non-binding) Best, Xingbo Dian Fu 于2021年1月6日周三 下午1:38写道: > +1 (binding) > > > 在 2021年1月6日,下午1:12,Shuiqiang Chen 写道: > > > > Hi devs, > > > > The discussion of the FLIP-153 [1] seems has reached a consensus through > > the mailing thread [2]. I would like to start a vote for it. > > > > The

Re: [VOTE] FLIP-153: Support state access in Python DataStream API

2021-01-05 Thread Wei Zhong
+1 (non-binding) Best, Wei > 在 2021年1月6日,14:05,Xingbo Huang 写道: > > +1 (non-binding) > > Best, > Xingbo > > Dian Fu 于2021年1月6日周三 下午1:38写道: > >> +1 (binding) >> >>> 在 2021年1月6日,下午1:12,Shuiqiang Chen 写道: >>> >>> Hi devs, >>> >>> The discussion of the FLIP-153 [1] seems has reached a conse

Re: Support local aggregate push down for Blink batch planner

2021-01-05 Thread Jark Wu
I think filter expressions and grouping sets are semantic arguments instead of utilities. If we want to push them into sources, the connector developers should be aware of them. Wrapping them in a context implicitly is error-prone that the existing connector will produce wrong results when upgradi

Re: Support local aggregate push down for Blink batch planner

2021-01-05 Thread Jingsong Li
> I think filter expressions and grouping sets are semantic arguments instead of utilities. If we want to push them into sources, the connector developers should be aware of them.Wrapping them in a context implicitly is error-prone that the existing connector will produce wrong results when upgradi

Re: [DISCUSS] Allow streaming operators to use managed memory

2021-01-05 Thread Yun Tang
I think using managed memory within streaming operator is a good idea and I just have a question over last conclusion: If both OPERATOR and STATE_BACKEND set as 70 to align with previous behavior, what will happen if one slot has both consumers of managed streaming operator and state backend?

Re: [DISCUSS] FLIP-153: Support state access in Python DataStream API

2021-01-05 Thread Yun Tang
The design looks good to me now. +1 to start the vote if there are no more comments.. Best Yun Tang From: Dian Fu Sent: Tuesday, January 5, 2021 13:32 To: dev@flink.apache.org Subject: Re: [DISCUSS] FLIP-153: Support state access in Python DataStream API Thanks

[jira] [Created] (FLINK-20861) Provide an option for serializing DECIMALs in JSON as plain number instead of scientific notation

2021-01-05 Thread Q Kang (Jira)
Q Kang created FLINK-20861: -- Summary: Provide an option for serializing DECIMALs in JSON as plain number instead of scientific notation Key: FLINK-20861 URL: https://issues.apache.org/jira/browse/FLINK-20861

Re: [VOTE] FLIP-153: Support state access in Python DataStream API

2021-01-05 Thread Yun Tang
+1 (binding) Best Yun Tang From: Wei Zhong Sent: Wednesday, January 6, 2021 14:07 To: dev Subject: Re: [VOTE] FLIP-153: Support state access in Python DataStream API +1 (non-binding) Best, Wei > 在 2021年1月6日,14:05,Xingbo Huang 写道: > > +1 (non-binding) > > Best

Re: [DISCUSS] Allow streaming operators to use managed memory

2021-01-05 Thread Xintong Song
Thanks for driving the discussion, @Jark. The conclusion LGTM. @Yun, Since the streaming operators did not use managed memory previously, I don't think it's possible for any use cases with managed memory streaming operators to align with the previous behaviors. No matter how the consumer weights a