Re: [DISCUSS] FLIP-95: New TableSource and TableSink interfaces

2020-03-26 Thread Timo Walther
Hi Becket, thanks for your feedback and the healthy discussion. I think the connector story will still keep many of us busy in the next time. It would be great if concepts from SQL can positively influence the design of Source/Sink abstractions. Esp. we should think about some guidelines of h

Re: [DISCUSS] FLIP-95: New TableSource and TableSink interfaces

2020-03-26 Thread Becket Qin
Hi Timo and Dawid, Thanks for the patient explanation. I just had a phone call with Kurt and Jark. I do see there are a few abstractions that we only see the use case in SQL so far. Therefore while thinking of a Source abstraction that may be shared with different use cases semantics is theoretica

Re: [DISCUSS] FLIP-95: New TableSource and TableSink interfaces

2020-03-26 Thread Dawid Wysakowicz
Hi Becket, Generally I don't think connector developers should bother with understanding any of the SQL concepts. I am not sure if we understand "connector developer" the same way. Let me describe how I see the process of writing a new source (that can be used in both Table & DataStream API) 1.

Re: [DISCUSS] FLIP-95: New TableSource and TableSink interfaces

2020-03-26 Thread Becket Qin
Hi Timo, Regarding "connector developers just need to know how to write an > ExpressionToParquetFilter": > This is the entire purpose of the DynamicTableSource/DynamicTableSink. > The bridging between SQL concepts and connector specific concepts. > Because this is the tricky part. How to get fro

Re: [DISCUSS] FLIP-95: New TableSource and TableSink interfaces

2020-03-26 Thread Timo Walther
Hi Becket, Regarding "PushDown/NestedPushDown which is internal to optimizer": Those concepts cannot be entirely internal to the optimizer, at some point the optimizer needs to pass them into the connector specific code. This code will then convert it to e.g. Parque expressions. So there must

Re: [DISCUSS] FLIP-95: New TableSource and TableSink interfaces

2020-03-25 Thread Becket Qin
Hi Timo, Thanks for the reply. I totally agree that there must be something new added to the connector in order to make it work for SQL / Table. My concern is mostly over what they should be, and how to add them. To be honest, I was kind of lost when looking at the interfaces such as DataStructure

Re: [DISCUSS] FLIP-95: New TableSource and TableSink interfaces

2020-03-25 Thread Timo Walther
Hi Becket, Let me clarify a few things first: Historically we thought of Table API/SQL as a library on top of DataStream API. Similar to Gelly or CEP. We used TypeInformation in Table API to integrate nicely with DataStream API. However, the last years have shown that SQL is not just a library

Re: [DISCUSS] FLIP-95: New TableSource and TableSink interfaces

2020-03-25 Thread Becket Qin
Hi Kurt, I do not object to promote the concepts of SQL, but I don't think we should do that by introducing a new dedicate set of connector public interfaces that is only for SQL. The same argument can be applied to Gelly, CEP, and Machine Learning, claiming that they need to introduce a dedicated

Re: [DISCUSS] FLIP-95: New TableSource and TableSink interfaces

2020-03-24 Thread Kurt Young
Hi Becket, I don't think we should discuss this in pure engineering aspects. Your proposal is trying to let SQL connector developers understand as less SQL concepts as possible. But quite the opposite, we are designing those interfaces to emphasize the SQL concept, to bridge high level concepts in

Re: [DISCUSS] FLIP-95: New TableSource and TableSink interfaces

2020-03-24 Thread Becket Qin
Hi Jark, It is good to know that we do not expect the end users to touch those interfaces. Then the question boils down to whether the connector developers should be aware of the interfaces that are only used by the SQL optimizer. It seems a win if we can avoid that. Two potential solutions off

Re: [DISCUSS] FLIP-95: New TableSource and TableSink interfaces

2020-03-24 Thread Jark Wu
Hi Becket, Regarding to Flavor1 and Flavor2, I want to clarify that user will never use table source like this: { MyTableSource myTableSource = MyTableSourceFactory.create(); myTableSource.setSchema(mySchema); myTableSource.applyFilterPredicate(expression); ... } TableFactory and Tab

Re: [DISCUSS] FLIP-95: New TableSource and TableSink interfaces

2020-03-24 Thread Becket Qin
Hi Timo and Dawid, Thanks for the clarification. They really help. You are right that we are on the same page regarding the hierarchy. I think the only difference between our view is the flavor of the interfaces. There are two flavors of the source interface for DataStream and Table source. *Flav

Re: [DISCUSS] FLIP-95: New TableSource and TableSink interfaces

2020-03-24 Thread Dawid Wysakowicz
Hi Becket, I really think we don't have a differing opinions. We might not see the changes in the same way yet. Personally I think of the DynamicTableSource as of a factory for a Source implemented for the DataStream API. The important fact about the DynamicTableSource and all feature traits (Supp

Re: [DISCUSS] FLIP-95: New TableSource and TableSink interfaces

2020-03-24 Thread Timo Walther
Hi Becket, it is true that concepts such as projection and filtering are worth having in DataStream API as well. And a SourceFunction can provide interfaces for those concepts. In the table related classes we will generate runtime classes that adhere to those interfaces and deal with RowData

Re: [DISCUSS] FLIP-95: New TableSource and TableSink interfaces

2020-03-24 Thread Becket Qin
Hi Jark, However, the interfaces proposed by FLIP-95 are mainly used during > optimization (compiling), not runtime. Yes, I am aware of that, I am wondering whether the SQL planner can use the counterpart interface in the Source to apply the optimizations. It seems should also work, right? If w

Re: [DISCUSS] FLIP-95: New TableSource and TableSink interfaces

2020-03-24 Thread Becket Qin
Hey Kurt, I don't think DataStream should see some SQL specific concepts such as > Filtering or ComputedColumn. Projectable and Filterable seems not necessarily SQL concepts, but could be applicable to DataStream source as well to reduce the network load. For example ORC and Parquet should proba

Re: [DISCUSS] FLIP-95: New TableSource and TableSink interfaces

2020-03-24 Thread Jark Wu
Thanks Timo for updating the formats section. That would be very helpful for changelog supporting (FLIP-105). I just left 2 minor comment about some method names. In general, I'm +1 to start a voting.

Re: [DISCUSS] FLIP-95: New TableSource and TableSink interfaces

2020-03-24 Thread Kurt Young
Hi Becket, I don't think DataStream should see some SQL specific concepts such as Filtering or ComputedColumn. It's better to stay within SQL area and translate to more generic concept when translating to DataStream/Runtime layer, such as use MapFunction to represent computed column logic. Best,

Re: [DISCUSS] FLIP-95: New TableSource and TableSink interfaces

2020-03-24 Thread Becket Qin
Hi Timo and Dawid, It's really great that we have the same goal. I am actually wondering if we can go one step further to avoid some of the interfaces in Table as well. For example, if we have the FilterableSource, do we still need the FilterableTableSource? Should DynamicTableSource just become

Re: [DISCUSS] FLIP-95: New TableSource and TableSink interfaces

2020-03-24 Thread Jingsong Li
+1. Thanks Timo for the design doc. We can also consider @Experimental too. But I am +1 to @PublicEvolving, we should be confident in the current change. Best, Jingsong Lee On Tue, Mar 24, 2020 at 4:30 PM Timo Walther wrote: > @Becket: We totally agree that we don't need table specific connect

Re: [DISCUSS] FLIP-95: New TableSource and TableSink interfaces

2020-03-24 Thread Timo Walther
@Becket: We totally agree that we don't need table specific connectors during runtime. As Dawid said, the interfaces proposed here are just for communication with the planner. Once the properties (watermarks, computed column, filters, projecttion etc.) are negotiated, we can configure a regular

Re: [DISCUSS] FLIP-95: New TableSource and TableSink interfaces

2020-03-24 Thread Dawid Wysakowicz
Hi Becket, Answering your question, we have the same intention not to duplicate connectors between datastream and table apis. The interfaces proposed in the FLIP are a way to describe relational properties of a source. The intention is as you described to translate all of those expressed as expres

Re: [DISCUSS] FLIP-95: New TableSource and TableSink interfaces

2020-03-23 Thread Kurt Young
Thanks Timo for the design doc. In general I'm +1 to this, with a minor comment. Since we introduced dozens interfaces all at once, I'm not sure if it's good to annotate them with @PublicEnvolving already. I can imagine these interfaces would only be stable after 1 or 2 major release. Given the f

Re: [DISCUSS] FLIP-95: New TableSource and TableSink interfaces

2020-03-23 Thread Becket Qin
Hi Timo, Thanks for the proposal. I completely agree that the current Table connectors could be simplified quite a bit. I haven't finished reading everything, but here are some quick thoughts. Actually to me the biggest question is why should there be two different connector systems for DataStrea

Re: [DISCUSS] FLIP-95: New TableSource and TableSink interfaces

2020-03-23 Thread Dawid Wysakowicz
Hi Timo, Thank you for the proposal. I think it is an important improvement that will benefit many parts of the Table API. The proposal looks really good to me and personally I would be comfortable with voting on the current state. Best, Dawid On 23/03/2020 18:53, Timo Walther wrote: > Hi every

Re: [DISCUSS] FLIP-95: New TableSource and TableSink interfaces

2020-03-23 Thread Timo Walther
Hi everyone, I received some questions around how the new interfaces play together with formats and their factories. Furthermore, for MySQL or Postgres CDC logs, the format should be able to return a `ChangelogMode`. Also, I incorporated the feedback around the factory design in general. I

Re: [DISCUSS] FLIP-95: New TableSource and TableSink interfaces

2020-03-18 Thread Timo Walther
Hi Benchao, this is a very good question. I will update the FLIP about this. The legacy planner will not support the new interfaces. It will only support the old interfaces. With the next release, I think the Blink planner is stable enough to be the default one as well. Regards, Timo On 18.

Re: [DISCUSS] FLIP-95: New TableSource and TableSink interfaces

2020-03-18 Thread Benchao Li
Hi Timo, Thank you and others for the efforts to prepare this FLIP. The FLIP LGTM generally. +1 for moving blink data structures to table-common, it's useful to udf too in the future. A little question is, do we plan to support the new interfaces and data types in legacy planner? Or we only plan