Re: [DISCUSS] Connectors and NULL handling

2019-06-23 Thread Xiaowei Jiang
Error handling policy for streaming jobs goes beyond potential corrupted messages in the source. Users may have subtle bugs while processing some messages which may cause the streaming jobs to fail. Even though this can be considered as a bug in user's code, users may prefer skip such messages (or

Re: Specifying parallelism on join operation

2019-06-23 Thread Xiaowei Jiang
You can use with(JoinFunction) to workaround it. See JavaDoc for Flink 1.8: @PublicEvolving

Re: [Discuss] FLIP-43: Savepoint Connector

2019-06-04 Thread Xiaowei Jiang
Hi Gordon & Seth, this looks like a very useful feature for analyze and manage states.  I agree that using DataSet is probably the most practical choice right now. But in the longer adding the TableAPI support for this will be nice. When analyzing the savepoint, I assume that the state backend r

Re: [VOTE] Release 1.8.0, release candidate #4

2019-03-26 Thread Xiaowei Jiang
+1 (non-binding) - checked checksums and GPG files - build from source successfully- run end-to-end precommit tests successfully- run end-to-end nightly tests successfully Xiaowei On Tuesday, March 26, 2019, 8:09:19 PM GMT+8, Yu Li wrote: +1 (non-binding) - Checked release notes: OK

Re: [ANNOUNCE] Contributing Alibaba's Blink

2019-01-21 Thread Xiaowei Jiang
Thanks Stephan! We are hoping to make the process as non-disruptive as possible to the Flink community. Making the Blink codebase public is the first step that hopefully facilitates further discussions. Xiaowei On Monday, January 21, 2019, 11:46:28 AM PST, Stephan Ewen wrote: Dear Fl

Re: [DISCUSS] Support Interactive Programming in Flink Table API

2018-11-22 Thread Xiaowei Jiang
Relying on a callback for the temp table for clean up is not very reliable. There is no guarantee that it will be executed successfully. We may risk leaks when that happens. I think that it's safer to have an association between temp table and session id. So we can always clean up temp tables which

Re: [DISCUSS] Long-term goal of making flink-table Scala-free

2018-11-22 Thread Xiaowei Jiang
Hi Timo, thanks for driving this! I think that this is a nice thing to do. While we are doing this, can we also keep in mind that we want to eventually have a TableAPI interface only module which users can take dependency on, but without including any implementation details? Xiaowei On Thu, Nov 2

Re: [DISCUSS] Table API Enhancement Outline

2018-11-18 Thread Xiaowei Jiang
sume we have a table: > >>>> val tab = util.addTable[(Long, String)]("MyTable", 'long, 'string, > >>>> 'proctime.proctime) > >>>> > >>>> Approach 1: > >>>> case1: Map follows Source Table > >>>> val res

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2018-11-17 Thread Xiaowei Jiang
Thanks Xuefu for the detailed design doc! One question on the properties associated with the catalog objects. Are we going to leave them completely free form or we are going to set some standard for that? I think that the answer may depend on if we want to explore catalog specific optimization oppo

Re: [DISCUSS] Task speculative execution for Flink batch

2018-11-17 Thread Xiaowei Jiang
Thanks Yangyu for the nice design doc! One thing to consider is the granularity of speculation. Multiple task may propagate data through pipeline mode. In such case, fixing a single task may not be enough. But you might be able to fix this problem by increasing the granularity of speculation. The t

Re: Flink sql joined with dimtable from mysql

2018-11-13 Thread Xiaowei Jiang
It was not super clean on what you did. But from your description, the join was not correct initially because existing MySQL data was not seen by Flink yet. Later when updates are seen by Flink, the result will be correct. A better place for such question is probably on the user mailing list. Xiao

Re: [DISCUSS] Table API Enhancement Outline

2018-11-07 Thread Xiaowei Jiang
pdate. But we need review and design this carefully, > > especially taking into account the cases of the failover (instead of just > > back-up the ACC it may also needs to remember the emit offset) and > > retractions, as the semantics of TableAggregateFunction emit are > dif

Re: [DISCUSS] Enhancing the functionality and productivity of Table API

2018-11-07 Thread Xiaowei Jiang
Hi Piotr: I want to clarify one thing first: I think that we will keep the interoperability between TableAPI and DataStream in any case. So user can switch between the two whenever needed. Given that, it would still be very helpful that users can use one API to achieve most of what they do. Curren

Re: [DISCUSS] Enhancing the functionality and productivity of Table API

2018-11-06 Thread Xiaowei Jiang
Hi Fabian, I totally agree with you that we should incrementally improve TableAPI. We don't suggest that we do anything drastic such as replacing DataSet API yet. We should see how much we can achieve by extending TableAPI cleanly. By then, we should see if there are any natural boundaries on how

Re: [DISCUSS] Table API Enhancement Outline

2018-11-06 Thread Xiaowei Jiang
Hi Jincheng, Thanks for adding the public interfaces! I think that it's a very good start. There are a few points that we need to have more discussions. - TableAggregateFunction - this is a very complex beast, definitely the most complex user defined objects we introduced so far. I think th

[DISCUSS] Table API Enhancement Outline

2018-11-05 Thread Xiaowei Jiang
Hi All, As Jincheng brought up in the previous email, there are a set of improvements needed to make Table API more complete/self-contained. To give a better overview on this, Jincheng, Jiangjie, Shaoxuan and myself discussed offline a bit and came up with an initial outline. Table API Enhancemen

Re: [DISCUSS] Enhancing the functionality and productivity of Table API

2018-11-05 Thread Xiaowei Jiang
Hi Fabian, these are great questions! I have some quick thoughts on some of these. Optimization opportunities: I think that you are right UDFs are more like blackboxes today. However this can change if we let user develop UDFs symbolically in the future (i.e., Flink will look inside the UDF code,

Re: [DISCUSS] Release 1.3.0 RC1 (Non voting, testing release candidate)

2017-05-19 Thread Xiaowei Jiang
Hi Robert, I did the following checks and found no issues: - Check if checksums and GPG files match the corresponding release files - Verify that the source archives do not contain any binaries - Check if the source release is building properly with Maven (including license header check and

[jira] [Created] (FLINK-6621) Legal check for 1.3.0 RC01 Release

2017-05-17 Thread Xiaowei Jiang (JIRA)
Xiaowei Jiang created FLINK-6621: Summary: Legal check for 1.3.0 RC01 Release Key: FLINK-6621 URL: https://issues.apache.org/jira/browse/FLINK-6621 Project: Flink Issue Type: Bug

Re: [DISCUSS] FLIP-17 Side Inputs

2017-03-23 Thread Xiaowei Jiang
Very nice discussion! The deadlock issue due to back pressure mechanism is temporary, which is going to be fixed once Stephan change it to a credit based approach. So we probably should not base our proposal on that temporary limitation. Once we have that issue fixed, the operator can choose to

Add partitionedKeyBy to DataStream

2016-10-20 Thread Xiaowei Jiang
After we do any interesting operations (e.g. reduce) on KeyedStream, the result becomes DataStream. In a lot of cases, the output still has the same or compatible keys with the KeyedStream (logically). But to do further operations on these keys, we are forced to use keyby again. This works semantic

Efficient Batch Operator in Streaming

2016-10-20 Thread Xiaowei Jiang
Very often, it's more efficient to process a batch of records at once instead of processing them one by one. We can use window to achieve this functionality. However, window will store all records in states, which can be costly. It's desirable to have an efficient implementation of batch operator.

[jira] [Created] (FLINK-4855) Add partitionedKeyBy to DataStream

2016-10-18 Thread Xiaowei Jiang (JIRA)
Xiaowei Jiang created FLINK-4855: Summary: Add partitionedKeyBy to DataStream Key: FLINK-4855 URL: https://issues.apache.org/jira/browse/FLINK-4855 Project: Flink Issue Type: Improvement

[jira] [Created] (FLINK-4854) Efficient Batch Operator in Streaming

2016-10-18 Thread Xiaowei Jiang (JIRA)
Xiaowei Jiang created FLINK-4854: Summary: Efficient Batch Operator in Streaming Key: FLINK-4854 URL: https://issues.apache.org/jira/browse/FLINK-4854 Project: Flink Issue Type: Improvement