Re: [DISCUSS] FLIP-134: DataStream Semantics for Bounded Input

2020-09-08 Thread Dawid Wysakowicz
Hey Aljoscha A couple of thoughts for the two remaining TODOs in the doc: # Processing Time Support in BATCH/BOUNDED execution mode I think there are two somewhat orthogonal problems around this topic:     1. Firing processing timers at the end of the job     2. Having processing timers in the B

Re: [DISCUSS] FLIP-107: Reading table columns from different parts of source records

2020-09-08 Thread Timo Walther
Hi everyone, I updated the FLIP again and hope that I could address the mentioned concerns. @Leonard: Thanks for the explanation. I wasn't aware that ts_ms and source.ts_ms have different semantics. I updated the FLIP and expose the most commonly used properties separately. So frequently use

[jira] [Created] (FLINK-19163) Add building py38 wheel package of PyFlink in Azure CI

2020-09-08 Thread Huang Xingbo (Jira)
Huang Xingbo created FLINK-19163: Summary: Add building py38 wheel package of PyFlink in Azure CI Key: FLINK-19163 URL: https://issues.apache.org/jira/browse/FLINK-19163 Project: Flink Issue

Re: [DISCUSS] FLIP-107: Reading table columns from different parts of source records

2020-09-08 Thread Jark Wu
Hi Timo, The updated CAST SYSTEM_METADATA behavior sounds good to me. I just noticed that a BIGINT can't be converted to "TIMESTAMP(3) WITH LOCAL TIME ZONE". So maybe we need to support this, or use "TIMESTAMP(3) WITH LOCAL TIME ZONE" as the defined type of Kafka timestamp? I think this makes sens

Re: [VOTE] FLIP-141: Intra-Slot Managed Memory Sharing

2020-09-08 Thread Aljoscha Krettek
+1 We just need to make sure to find a good name before the release but shouldn't block any work on this. Aljoscha On 08.09.20 07:59, Xintong Song wrote: Thanks for the vote, @Jincheng. Concerning the namings, the original idea was, as you suggested, to have separate configuration names fo

Re: [DISCUSS] FLIP-140: Introduce bounded style execution for keyed streams

2020-09-08 Thread Dawid Wysakowicz
Hey Kurt, Thank you for comments! Ad. 1 I might have missed something here, but as far as I see it is that using the current execution stack with regular state backends (RocksDB in particular if we want to have spilling capabilities) is equivalent to hash-based execution. I can see a different sp

Re: Merge SupportsComputedColumnPushDown and SupportsWatermarkPushDown

2020-09-08 Thread Timo Walther
Hi Shengkai, first of I would not consider the section Problems of SupportsWatermarkPushDown" as a "problem". It was planned to update the WatermarkProvider once the interfaces are ready. See the comment in WatermarkProvider: // marker interface that will be filled after FLIP-126: // Waterma

Re: [DISCUSS] FLIP-107: Reading table columns from different parts of source records

2020-09-08 Thread Timo Walther
Hi Jark, according to Flink's and Calcite's casting definition in [1][2] TIMESTAMP WITH LOCAL TIME ZONE should be castable from BIGINT. If not, we will make it possible ;-) I'm aware of DeserializationSchema.getProducedType but I think that this method is actually misplaced. The type should

Re: [DISCUSS] FLIP-33: Standardize connector metrics

2020-09-08 Thread Konstantin Knauf
Hi Becket, Thank you for picking up this FLIP. I have a few questions: * two thoughts on naming: * idleTime: In the meantime, a similar metric "idleTimeMsPerSecond" has been introduced in https://issues.apache.org/jira/browse/FLINK-16864. They have a similar name, but different definitions of

1.11.1 Hive connector doesn't work with Hive 1.0 or 1.1

2020-09-08 Thread Rui Li
Hello dev, A user hits the following issue when using Flink 1.11.1 to connect to Hive 1.1.0: java.lang.NoSuchMethodError: > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(Lorg/apache/hadoop/conf/Configuration;)V > at > org.apache.flink.table.catalog.hive.client.HiveShimV100.getHiveMetastor

Re: 1.11.1 Hive connector doesn't work with Hive 1.0 or 1.1

2020-09-08 Thread Dian Fu
Hi Rui, The maven artifacts are built using the script: releasing/deploy_staging_jars.sh [1]. Regards, Dian [1] https://cwiki.apache.org/confluence/display/FLINK/Creating+a+Flink+Release > 在 2020年9月8日,下午7:19,Rui Li 写道: > > maven artifacts

Re: Merge SupportsComputedColumnPushDown and SupportsWatermarkPushDown

2020-09-08 Thread Jark Wu
Hi Timo, Regarding "pushing other computed columns into source, e.g. encrypted records/columns, performing checksum checks, reading metadata etc.", I'm not sure about this. 1. the planner don't know which computed column should be pushed into source 2. it seems that we can't improve performances i

Re: 1.11.1 Hive connector doesn't work with Hive 1.0 or 1.1

2020-09-08 Thread Rui Li
Thanks Dian. The script looks all right to me. I'll double check with the user whether the issue is related to his building environment. On Tue, Sep 8, 2020 at 7:36 PM Dian Fu wrote: > Hi Rui, > > The maven artifacts are built using the > script: releasing/deploy_staging_jars.sh [1]. > > Regards

Re: [DISCUSS] FLIP-136: Improve interoperability between DataStream and Table API

2020-09-08 Thread Danny Chan
Hi, Timo ~ "It is not about changelog mode compatibility, it is about the type compatibility.” For fromDataStream(dataStream, Schema), there should not be compatibility problem or data type inconsistency. We know the logical type from Schema and physical type from the dataStream itself. For to

Re: Merge SupportsComputedColumnPushDown and SupportsWatermarkPushDown

2020-09-08 Thread Shengkai Fang
Hi Timo and Jark.Thanks for your replies. Maybe I don't explain clearly in doc. I think the main reason behind is we have no means to distinguish the calc in LogicalProject. Let me give you an example to illustrate the reason. Assume we have 2 cases: case 1: create table MyTable ( int a, int

Re: [VOTE] FLIP-141: Intra-Slot Managed Memory Sharing

2020-09-08 Thread Yu Li
+1 Best Regards, Yu On Tue, 8 Sep 2020 at 17:03, Aljoscha Krettek wrote: > +1 > > We just need to make sure to find a good name before the release but > shouldn't block any work on this. > > Aljoscha > > On 08.09.20 07:59, Xintong Song wrote: > > Thanks for the vote, @Jincheng. > > > > > > Con

Re: [DISCUSS] FLIP-136: Improve interoperability between DataStream and Table API

2020-09-08 Thread Timo Walther
Hi Danny, Your proposed signatures sound good to me. fromDataStream(dataStream, Schema) toDataStream(table, AbstractDataType) They address all my concerns. The API would not be symmetric anymore, but this is fine with me. Others raised concerns about deprecating `fromDataStream(dataStream, Ex

Re: [DISCUSS] FLIP-140: Introduce bounded style execution for keyed streams

2020-09-08 Thread Kurt Young
Regarding #1, yes the state backend is definitely hash-based execution. However there are some differences between batch hash-based execution. The key difference is *random access & read/write mixed workload". For example, by using state backend in streaming execution, one have to mix the read and

Re: Merge SupportsComputedColumnPushDown and SupportsWatermarkPushDown

2020-09-08 Thread Timo Walther
Hi Jark, Hi Shengkai, "shall we push the expressions in the following Projection too?" This is something that we should at least consider. I also don't find a strong use case. But what I see is that we are merging concepts that actually can be separated. And we are executing the same code twi

Re: [DISCUSS] FLIP-33: Standardize connector metrics

2020-09-08 Thread Becket Qin
On Tue, Sep 8, 2020 at 6:55 PM Konstantin Knauf wrote: > Hi Becket, > > Thank you for picking up this FLIP. I have a few questions: > > * two thoughts on naming: >* idleTime: In the meantime, a similar metric "idleTimeMsPerSecond" has > been introduced in https://issues.apache.org/jira/browse

Re: Merge SupportsComputedColumnPushDown and SupportsWatermarkPushDown

2020-09-08 Thread Shengkai Fang
Hi, Timo. I agree with you that the concepts Watermark and ComputedColumn should be separated. However, we are merging the interface SupportsCalcPushDown and SupportsWatermarkPushDown actually. The concept computed column has disappeared in optimization. As for the drawback you mentiond, I have alr

Re: [DISCUSS] FLIP-33: Standardize connector metrics

2020-09-08 Thread Becket Qin
Hey Konstantin, Thanks for the feedback and suggestions. Please see the reply below. * idleTime: In the meantime, a similar metric "idleTimeMsPerSecond" has > been introduced in https://issues.apache.org/jira/browse/FLINK-16864. They > have a similar name, but different definitions of idleness, >

[jira] [Created] (FLINK-19164) Release scripts break other dependency versions unintentionally

2020-09-08 Thread Serhat Soydan (Jira)
Serhat Soydan created FLINK-19164: - Summary: Release scripts break other dependency versions unintentionally Key: FLINK-19164 URL: https://issues.apache.org/jira/browse/FLINK-19164 Project: Flink

Re: [DISCUSS] FLIP-107: Reading table columns from different parts of source records

2020-09-08 Thread Leonard Xu
HI, Timo Thanks for driving this FLIP. Sorry but I have a concern about Writing metadata via DynamicTableSink section: CREATE TABLE kafka_table ( id BIGINT, name STRING, timestamp AS CAST(SYSTEM_METADATA("timestamp") AS BIGINT) PERSISTED, headers AS CAST(SYSTEM_METADATA("headers") AS MAP

Re: 1.11.1 Hive connector doesn't work with Hive 1.0 or 1.1

2020-09-08 Thread Rui Li
Verified the issue was related to the building environment. The published jar is good. Thanks Dian for the help. On Tue, Sep 8, 2020 at 7:49 PM Rui Li wrote: > Thanks Dian. The script looks all right to me. I'll double check with the > user whether the issue is related to his building environmen

Re: [DISCUSS] FLIP-140: Introduce bounded style execution for keyed streams

2020-09-08 Thread Dawid Wysakowicz
Ad. 1 Yes, you are right in principle. Let me though clarify my proposal a bit. The proposed sort-style execution aims at a generic KeyedProcessFunction were all the "aggregations" are actually performed in the user code. It tries to improve the performance by actually removing the need to use Ro

Re: Flink stateful functions : compensating callback to invoked functions on a timeout

2020-09-08 Thread Igal Shilman
Hi, Thanks for posting this interesting question, and welcome to StateFun! :-) The first thing that I would like to mention is that, if your original motivation for that scenario is a concern of a transient failures such as: - did function Y ever received a message sent by function X ? - did sendi

Re: [DISCUSS] FLIP-107: Reading table columns from different parts of source records

2020-09-08 Thread Timo Walther
Hi Leonard, the only alternative I see is that we introduce a concept that is completely different to computed columns. This is also mentioned in the rejected alternative section of the FLIP. Something like: CREATE TABLE kafka_table ( id BIGINT, name STRING, timestamp INT SYSTEM_M

[DISCUSS] FLIP-142: Disentangle StateBackends from Checkpointing

2020-09-08 Thread Seth Wiesman
Hi Devs, I'd like to propose an update to how state backends and checkpoint storage are configured to help users better understand Flink. Apache Flink's durability story is a mystery to many users. One of the most common recurring questions from users comes from not understanding the relationship

[jira] [Created] (FLINK-19165) Clean up the UnilateralSortMerger

2020-09-08 Thread Dawid Wysakowicz (Jira)
Dawid Wysakowicz created FLINK-19165: Summary: Clean up the UnilateralSortMerger Key: FLINK-19165 URL: https://issues.apache.org/jira/browse/FLINK-19165 Project: Flink Issue Type: Bug

Re: [DISCUSS] FLIP-33: Standardize connector metrics

2020-09-08 Thread Becket Qin
Hi Stephan, Thanks for the input. Just a few more clarifications / questions. *Num Bytes / Records Metrics* 1. At this point, the *numRecordsIn(Rate)* metrics exist in both OperatorIOMetricGroup and TaskIOMetricGroup. I did not find *numRecordsIn(Rate)* in the TaskIOMetricGroup updated anywhere

[jira] [Created] (FLINK-19166) StreamingFileWriter should register Listener before the initialization of buckets

2020-09-08 Thread Jingsong Lee (Jira)
Jingsong Lee created FLINK-19166: Summary: StreamingFileWriter should register Listener before the initialization of buckets Key: FLINK-19166 URL: https://issues.apache.org/jira/browse/FLINK-19166 Pro

Re: [DISCUSS] Releasing Flink 1.11.2

2020-09-08 Thread Jingsong Li
Hi Zhu Zhu, Add a new blocker: https://issues.apache.org/jira/browse/FLINK-19166 Will fix it soon. Best, Jingsong On Tue, Sep 8, 2020 at 12:26 AM Zhu Zhu wrote: > Hi All, > > Since there are still two 1.11.2 blockers items in progress, RC1 creation > will be postponed to tomorrow. > > Thanks,

Re: [VOTE] FLIP-137: Support Pandas UDAF in PyFlink

2020-09-08 Thread Xingbo Huang
Hi all, Thanks a lot for the discussion and votes. I will summarize the result in a separate email. Best, Xingbo Zhu Zhu 于2020年9月8日周二 上午11:16写道: > +1 > > Thanks, > Zhu > > Hequn Cheng 于2020年9月8日周二 上午8:57写道: > > > +1 (binding) > > > > > > On Tue, Sep 8, 2020 at 7:43 AM jincheng sun > > wrote:

[RESULT][VOTE] FLIP-137: Support Pandas UDAF in PyFlink

2020-09-08 Thread Xingbo Huang
Hi all, The voting time for FLIP-137[1] has passed. I'm closing the vote now. There 6 + 1 votes, 4 of which are binding: - Dian (binding) - Jincheng (binding) - Hequn (binding) - Zhu Zhu (binding) - Wei (non-binding) - Shuiqiang (non-binding) There are no disapproving votes. Thus, FLIP-137 has

Re: [DISCUSS] Releasing Flink 1.11.2

2020-09-08 Thread Jingsong Li
Hi Zhu Zhu, Replenish its[1] influence: For HiveStreamingSink & FileSystemSink in Table/SQL, partition commit make partition visible for downstream Hive/Spark engines. But due to FLINK-19166, will lose some partitions to commit after Job failover in some cases, especially for short partitions. In

Re: [DISCUSS] Releasing Flink 1.11.2

2020-09-08 Thread Zhu Zhu
Thanks for reporting this issue and offering to fix it @Jingsong Li Agreed it is a reasonable blocker. I will postpone 1.11.2 RC1 creation until it is fixed. Thanks, Zhu Jingsong Li 于2020年9月9日周三 上午11:27写道: > Hi Zhu Zhu, > > Replenish its[1] influence: > For HiveStreamingSink & FileSystemSink i

Re: [DISCUSS] FLIP-140: Introduce bounded style execution for keyed streams

2020-09-08 Thread Kurt Young
I doubt that any sorting algorithm would work with only knowing the keys are different but without information of which is greater. Best, Kurt On Tue, Sep 8, 2020 at 10:59 PM Dawid Wysakowicz wrote: > Ad. 1 > > Yes, you are right in principle. > > Let me though clarify my proposal a bit. The

Re: [DISCUSS] FLIP-136: Improve interoperability between DataStream and Table API

2020-09-08 Thread Danny Chan
Thanks for the summary Timo ~ I want to clarify a little bit, what is the conclusion about the fromChangelogStream and toChangelogStream, should we use this name or we use fromDataStream with an optional ChangelogMode flag ? Best, Danny Chan 在 2020年9月8日 +0800 PM8:22,Timo Walther ,写道: > Hi Danny

Re: [DISCUSS] FLIP-136: Improve interoperability between DataStream and Table API

2020-09-08 Thread Jark Wu
Hi, I'm +1 to use the naming of from/toDataStream, rather than from/toInsertStream. So we don't need to deprecate the existing `fromDataStream`. I'm +1 to Danny's proposal: fromDataStream(dataStream, Schema) and toDataStream(table, AbstractDataType) I think we can also keep the method `createTem

[jira] [Created] (FLINK-19168) Upgrade Kafka client version

2020-09-08 Thread darion yaphet (Jira)
darion yaphet created FLINK-19168: - Summary: Upgrade Kafka client version Key: FLINK-19168 URL: https://issues.apache.org/jira/browse/FLINK-19168 Project: Flink Issue Type: Improvement

[jira] [Created] (FLINK-19167) Proccess Function Example could not work

2020-09-08 Thread tinny cat (Jira)
tinny cat created FLINK-19167: - Summary: Proccess Function Example could not work Key: FLINK-19167 URL: https://issues.apache.org/jira/browse/FLINK-19167 Project: Flink Issue Type: Bug

Re: [DISCUSS] FLIP-107: Reading table columns from different parts of source records

2020-09-08 Thread Kurt Young
I also share the concern that reusing the computed column syntax but have different semantics would confuse users a lot. Besides, I think metadata fields are conceptually not the same with computed columns. The metadata field is a connector specific thing and it only contains the information that

Re: [DISCUSS] FLIP-136: Improve interoperability between DataStream and Table API

2020-09-08 Thread Timo Walther
The conclusion is that we will drop `fromChangelogStream(ChangelogMode, DataStream)` but will keep `fromChangelogStream(DataStream)`. The latter is necessary to have a per-record changeflag. We could think about merging `fromChangelogStream`/`fromDataStream` and `toChangelogStream`/`toDataStre