Re: [DISCUSS] FLIP-153: Support state access in Python DataStream API

2021-01-04 Thread Yu Li
Thanks for driving the discussion Shuiqiang, and sorry for chiming in late. *bq. However, all the state access will be synchronized in the Java operator and so there will be no concurrent access to the state backend.* Could you add a section to explicitly mention this in the FLIP document? I think

[jira] [Created] (FLINK-20839) some mistakes in state.md and state_zh.md

2021-01-04 Thread wym_maozi (Jira)
wym_maozi created FLINK-20839: - Summary: some mistakes in state.md and state_zh.md Key: FLINK-20839 URL: https://issues.apache.org/jira/browse/FLINK-20839 Project: Flink Issue Type: Bug

[jira] [Created] (FLINK-20840) Projection pushdown doesn't work in temporal(lookup) join

2021-01-04 Thread Leonard Xu (Jira)
Leonard Xu created FLINK-20840: -- Summary: Projection pushdown doesn't work in temporal(lookup) join Key: FLINK-20840 URL: https://issues.apache.org/jira/browse/FLINK-20840 Project: Flink Issue

Re: Task manger isn’t initiating with defined values in Flink 1.11 version as part of EMR 6.1.0

2021-01-04 Thread Till Rohrmann
Hi Deep, Flink has dropped support for specifying the number of TMs via -n since the introduction of Flip-6. Since then, Flink will automatically start TMs depending on the required resources. Hence, there is no need to specify the -n parameter anymore. Instead, you should specify the parallelism

[jira] [Created] (FLINK-20841) Fix compile error due to duplicated generated files

2021-01-04 Thread Matthias (Jira)
Matthias created FLINK-20841: Summary: Fix compile error due to duplicated generated files Key: FLINK-20841 URL: https://issues.apache.org/jira/browse/FLINK-20841 Project: Flink Issue Type: Bug

[jira] [Created] (FLINK-20842) Data row is smaller than a column index, internal schema representation is probably out of sync with real database schema

2021-01-04 Thread zhouenning (Jira)
zhouenning created FLINK-20842: -- Summary: Data row is smaller than a column index, internal schema representation is probably out of sync with real database schema Key: FLINK-20842 URL: https://issues.apache.org/jira

Re: [DISCUSS] FLIP-153: Support state access in Python DataStream API

2021-01-04 Thread Shuiqiang Chen
Hi Yu, Thanks a lot for your suggestions. I have addressed your inlined comments in the FLIP and also added a new section "State backed access synchronization" that explains the way to make sure there is no concurrent access to the state backend. Please have a look. Best, Shuiqiang Yu Li 于202

[jira] [Created] (FLINK-20843) UnalignedCheckpointITCase is unstable

2021-01-04 Thread Aljoscha Krettek (Jira)
Aljoscha Krettek created FLINK-20843: Summary: UnalignedCheckpointITCase is unstable Key: FLINK-20843 URL: https://issues.apache.org/jira/browse/FLINK-20843 Project: Flink Issue Type: Bug

[DISCUSS] FLIP-155: Introduce a few convenient operations in Table API

2021-01-04 Thread Dian Fu
Hi all, I'd like to start a discussion about introducing a few convenient operations in Table API from the perspective of ease of use. Currently some tasks are not easy to express in Table API e.g. deduplication, topn, etc, or not easy to express when there are hundreds of columns in a table,

Re: [DISCUSS] FLIP-155: Introduce a few convenient operations in Table API

2021-01-04 Thread Wei Zhong
Hi Dian, Big +1 for making the Table API easier to use. Java users and Python users can both benefit from it. I think it would be better if we add some Python API examples. Best, Wei > 在 2021年1月4日,20:03,Dian Fu 写道: > > Hi all, > > I'd like to start a discussion about introducing a few con

Re: [DISCUSS] Allow streaming operators to use managed memory

2021-01-04 Thread Aljoscha Krettek
I agree, we should allow streaming operators to use managed memory for other use cases. Do you think we need an additional "consumer" setting or that they would just use `DATAPROC` and decide by themselves what to use the memory for? Best, Aljoscha On 2020/12/22 17:14, Jark Wu wrote: Hi all

[jira] [Created] (FLINK-20844) Support add jar in SQL client

2021-01-04 Thread Rui Li (Jira)
Rui Li created FLINK-20844: -- Summary: Support add jar in SQL client Key: FLINK-20844 URL: https://issues.apache.org/jira/browse/FLINK-20844 Project: Flink Issue Type: New Feature Component

[jira] [Created] (FLINK-20845) Drop support for Scala 2.11

2021-01-04 Thread Nick Burkard (Jira)
Nick Burkard created FLINK-20845: Summary: Drop support for Scala 2.11 Key: FLINK-20845 URL: https://issues.apache.org/jira/browse/FLINK-20845 Project: Flink Issue Type: Sub-task Co

Re: [DISCUSS] FLIP-155: Introduce a few convenient operations in Table API

2021-01-04 Thread Seth Wiesman
This makes sense, I have some questions about method names. What do you think about renaming `dropDuplicates` to `deduplicate`? I don't think that drop is the right word to use for this operation, it implies records are filtered where this operator actually issues updates and retractions. Also, de

Re: Task manger isn’t initiating with defined values in Flink 1.11 version as part of EMR 6.1.0

2021-01-04 Thread DEEP NARAYAN Singh
Thanks Till, for the detailed explanation.I tried and it is working fine. Once again thanks for your quick response. Regards, -Deep On Mon, 4 Jan, 2021, 2:20 PM Till Rohrmann, wrote: > Hi Deep, > > Flink has dropped support for specifying the number of TMs via -n since the > introduction of F

[ANNOUNCE] Changes to generated avro files

2021-01-04 Thread Chesnay Schepler
Hello, in FLINK-20790 we have changed where generated Avro files are placed.Until then they were put directly under the src/ tree, with some committed to git, other being ignored via .gitignore. This has caused issues when switching branches (specifically, not being able to compile 1.11 afte

Re: [DISCUSS] FLIP-155: Introduce a few convenient operations in Table API

2021-01-04 Thread Timo Walther
Hi Dian, thanks for the proposed FLIP. I haven't taken a deep look at the proposal yet but will do so shortly. In general, we should aim to make the Table API as concise and self-explaining as possible. E.g. `dropna` does not sound obvious to me. Regarding `myTable.coalesce($("a"), 1).as("a"

[jira] [Created] (FLINK-20846) Decouple checkpoint services from CheckpointCoordinator

2021-01-04 Thread Till Rohrmann (Jira)
Till Rohrmann created FLINK-20846: - Summary: Decouple checkpoint services from CheckpointCoordinator Key: FLINK-20846 URL: https://issues.apache.org/jira/browse/FLINK-20846 Project: Flink Iss

[jira] [Created] (FLINK-20847) Update CompletedCheckpointStore.shutdown() signature

2021-01-04 Thread Roman Khachatryan (Jira)
Roman Khachatryan created FLINK-20847: - Summary: Update CompletedCheckpointStore.shutdown() signature Key: FLINK-20847 URL: https://issues.apache.org/jira/browse/FLINK-20847 Project: Flink

Re: Support local aggregate push down for Blink batch planner

2021-01-04 Thread Kurt Young
Local aggregation is more like a physical operator rather than logical operator. I would suggest going with idea #1. Best, Kurt On Wed, Dec 30, 2020 at 8:31 PM Sebastian Liu wrote: > Hi Jark, Thx a lot for your quick reply and valuable suggestions. > For (1): Agree: Since we are in the period

Re: Support local aggregate push down for Blink batch planner

2021-01-04 Thread Sebastian Liu
Hi Kurt, Thx a lot for your feedback. If local aggregation is more like a physical operator rather than logical operator, I think your suggestion should be idea #2 which handle all in the physical optimization phase? Looking forward for the further discussion. Kurt Young 于2021年1月5日周二 上午9:52写道:

Re: Support local aggregate push down for Blink batch planner

2021-01-04 Thread Kurt Young
Sorry for the typo -_-! I meant idea #2. Best, Kurt On Tue, Jan 5, 2021 at 10:59 AM Sebastian Liu wrote: > Hi Kurt, > > Thx a lot for your feedback. If local aggregation is more like a physical > operator rather than logical > operator, I think your suggestion should be idea #2 which handle a

Re: [DISCUSS] Allow streaming operators to use managed memory

2021-01-04 Thread Jark Wu
Hi Aljoscha, I think we may need to divide `DATAPROC` into `OPERATOR` and `STATE_BACKEND`, because they have different scope (slot vs. operator). But @Xintong Song may have more insights on it. Best, Jark On Mon, 4 Jan 2021 at 20:44, Aljoscha Krettek wrote: > I agree, we should allow streami

Re: [DISCUSS] FLIP-155: Introduce a few convenient operations in Table API

2021-01-04 Thread Jark Wu
Thanks Dian, +1 to `deduplicate`. Regarding `myTable.coalesce($("a"), 1).as("a")`, I'm afraid it may conflict/confuse the built-in expression `coalesce(f0, 0)` (we may introduce it in the future). Besides that, could we also align other features of Flink SQL, e.g. event-time/processing-time temp

Re: [DISCUSS] FLIP-155: Introduce a few convenient operations in Table API

2021-01-04 Thread Dian Fu
Thanks a lot for your comments! Regarding to Python Table API examples: I thought it should be straightforward about how to use these operations in Python Table API and so have not added them. However, the suggestions make sense to me and I have added some examples about how to use them in Pyth

Re: Support local aggregate push down for Blink batch planner

2021-01-04 Thread Jark Wu
I'm also +1 for idea#2. Regarding to the updated interface, Result applyAggregates(List aggregateExpressions, int[] groupSet, DataType aggOutputDataType); final class Result { private final List acceptedAggregates; private final List remainingAggregates; } I have following co

Re: Support local aggregate push down for Blink batch planner

2021-01-04 Thread Sebastian Liu
Thanks for the clarification. I have resolved all of the comments and added a conclusion section. Looking forward to the further feedback from our community. If we get consensus on the design doc, I can push the implementation related work. Kurt Young 于2021年1月5日周二 上午11:04写道: > Sorry for the typ

Re: [DISCUSS] Allow streaming operators to use managed memory

2021-01-04 Thread Xintong Song
+1 for allowing streaming operators to use managed memory. As for the consumer names, I'm afraid using `DATAPROC` for both streaming ops and state backends will not work. Currently, RocksDB state backend uses a shared piece of memory for all the states within that slot. It's not the operator's dec

Re: [DISCUSS] Allow streaming operators to use managed memory

2021-01-04 Thread Jark Wu
+1 to Xingtong's proposal! Best, Jark On Tue, 5 Jan 2021 at 12:13, Xintong Song wrote: > +1 for allowing streaming operators to use managed memory. > > As for the consumer names, I'm afraid using `DATAPROC` for both streaming > ops and state backends will not work. Currently, RocksDB state back

Re: [DISCUSS] FLIP-153: Support state access in Python DataStream API

2021-01-04 Thread Dian Fu
Thanks Shuiqiang for driving this. The design looks good to me. +1 to start the vote if there are no more comments. Regards, Dian > 在 2021年1月4日,下午7:40,Shuiqiang Chen 写道: > > Hi Yu, > > Thanks a lot for your suggestions. > > I have addressed your inlined comments in the FLIP and also added a

Re: [DISCUSS] Allow streaming operators to use managed memory

2021-01-04 Thread Jingsong Li
+1 for allowing streaming operators to use managed memory. The memory use of streams requires some hierarchy, and the bottom layer is undoubtedly the current StateBackend. Let the stream operators freely use the managed memory, which will make the memory management model to be unified and give the

[DISCUSS]Some thoughts about CatalogPartitionSpec

2021-01-04 Thread Jun Zhang
Hello dev: Now I encounter a problem when using the method "Catalog#listPartitions(ObjectPath, CatalogPartitionSpec)". I found that the partitionSpec type in CatalogPartitionSpec is Map, This is no problem for hivecatalog, but my subclass of Catalog needs precise types. For example,

Re: [DISCUSS]Some thoughts about CatalogPartitionSpec

2021-01-04 Thread Jark Wu
Hi Jun, I'm curious why it doesn't work when represented in string? You can get the field type from the CatalogTable#getSchema(), then parse/cast the partition value to the type you want. Best, Jark On Tue, 5 Jan 2021 at 13:43, Jun Zhang wrote: > Hello dev: > Now I encounter a problem w

[Discussion] Let catalog manage temporary catalog objects

2021-01-04 Thread Rui Li
Hi Dev, I'd like to start a discussion about whether we can let catalog handle temporary catalog objects. Currently temporary table/view is managed by CatalogManager, and temporary function is managed by FunctionCatalog. This causes problems when I try to support temporary hive table/function. Fo

Re: Support local aggregate push down for Blink batch planner

2021-01-04 Thread Sebastian Liu
Hi Jark, Thx for your further feedback and help. The interface of SupportsAggregatePushDown may indeed need some adjustments. For (1) Agree: Yeah, the upstream only need to know if the TableSource can handle all of the aggregates. It's better to just return a boolean type to indicate whether all

Re: [DISCUSS]Some thoughts about CatalogPartitionSpec

2021-01-04 Thread Jun Zhang
hi ,Jack: If the partition type is int and we pass in a string type, the system will throw an exception that the type does not match. We can indeed cast by get the schema, but I think if CatalogPartitionSpec#partitionSpec is of type Map, there is no need to do cast operation, and the universal and

Re: Support local aggregate push down for Blink batch planner

2021-01-04 Thread Jingsong Li
Thanks for your proposal! Sebastian. +1 for SupportsAggregatePushDown. The above wonderful discussion has solved many of my concerns. ## Semantic problems We may need to add some mechanisms or comments, because as far as I know, the semantics of each database is actually different, which may nee

[jira] [Created] (FLINK-20848) Kafka consumer ID is not specified correctly in new KafkaSource

2021-01-04 Thread Qingsheng Ren (Jira)
Qingsheng Ren created FLINK-20848: - Summary: Kafka consumer ID is not specified correctly in new KafkaSource Key: FLINK-20848 URL: https://issues.apache.org/jira/browse/FLINK-20848 Project: Flink

Re: [DISCUSS]Some thoughts about CatalogPartitionSpec

2021-01-04 Thread Jark Wu
Hi Jun, AFAIK, the main reason to use Map is because it's easy for serialization and deserialization. For example, if we use Java `LocalDateTime` instead of String to represent TIMESTAMP partition value, then users may deserialize into Java `Timestamp` to Flink framework, which may cause problems.

Re: [DISCUSS]Some thoughts about CatalogPartitionSpec

2021-01-04 Thread Jun Zhang
hi,jark: Thanks for your explanation. I am doing the integration of flink and iceberg. The iceberg partition needs to be of accurate type, and I cannot modify it. I will follow what you suggestion, get the column type by schema, and then do the cast. Jark Wu 于2021年1月5日周二 下午3:05写道: > Hi Jun, >

[jira] [Created] (FLINK-20849) Improve JavaDoc and logging of new KafkaSource

2021-01-04 Thread Qingsheng Ren (Jira)
Qingsheng Ren created FLINK-20849: - Summary: Improve JavaDoc and logging of new KafkaSource Key: FLINK-20849 URL: https://issues.apache.org/jira/browse/FLINK-20849 Project: Flink Issue Type: