+1. Thanks Timo for the design doc. We can also consider @Experimental too. But I am +1 to @PublicEvolving, we should be confident in the current change.
Best, Jingsong Lee On Tue, Mar 24, 2020 at 4:30 PM Timo Walther <twal...@apache.org> wrote: > @Becket: We totally agree that we don't need table specific connectors > during runtime. As Dawid said, the interfaces proposed here are just for > communication with the planner. Once the properties (watermarks, > computed column, filters, projecttion etc.) are negotiated, we can > configure a regular Flink connector. > > E.g. setting the watermark assigner and deserialization schema of a > Kafka connector. > > For better separation of concerns, Flink connectors should not include > relational interfaces and depend on flink-table. This is the > responsibility of table source/sink. > > @Kurt: I would like to mark them @PublicEvolving already because we need > to deprecate the old interfaces as early as possible. We cannot redirect > to @Internal interfaces. They are not marked @Public, so we can still > evolve them. But a core design shift should not happen again, it would > leave a bad impression if we are redesign over and over again. Instead > we should be confident in the current change. > > Regards, > Timo > > > On 24.03.20 09:20, Dawid Wysakowicz wrote: > > Hi Becket, > > > > Answering your question, we have the same intention not to duplicate > > connectors between datastream and table apis. The interfaces proposed in > > the FLIP are a way to describe relational properties of a source. The > > intention is as you described to translate all of those expressed as > > expressions or other Table specific structures into a DataStream source. > > In other words I think what we are doing here is in line with what you > > described. > > > > Best, > > > > Dawid > > > > On 24/03/2020 02:23, Becket Qin wrote: > >> Hi Timo, > >> > >> Thanks for the proposal. I completely agree that the current Table > >> connectors could be simplified quite a bit. I haven't finished reading > >> everything, but here are some quick thoughts. > >> > >> Actually to me the biggest question is why should there be two different > >> connector systems for DataStream and Table? What is the fundamental > reason > >> that is preventing us from merging them to one? > >> > >> The basic functionality of a connector is to provide capabilities to do > IO > >> and Serde. Conceptually, Table connectors should just be DataStream > >> connectors that are dealing with Rows. It seems that quite a few of the > >> special connector requirements are just a specific way to do IO / Serde. > >> Taking SupportsFilterPushDown as an example, imagine we have the > following > >> interface: > >> > >> interface FilterableSource<PREDICATE> { > >> void applyFilterable(Supplier<PREDICATE> predicate); > >> } > >> > >> And if a ParquetSource would like to support filterable, it will become: > >> > >> class ParquetSource implements Source, > FilterableSource(FilterPredicate> { > >> ... > >> } > >> > >> For Table, one just need to provide an predicate supplier that converts > an > >> Expression to the specified predicate type. This has a few benefit: > >> 1. Same unified API for filterable for sources, regardless of > DataStream or > >> Table. > >> 2. The DataStream users now can also use the ExpressionToPredicate > >> supplier if they want to. > >> > >> To summarize, my main point is that I am wondering if it is possible to > >> have a single set of connector interface for both Table and DataStream, > >> rather than having two hierarchies. I am not 100% sure if this would > work, > >> but if it works, this would be a huge win from both code maintenance and > >> user experience perspective. > >> > >> Thanks, > >> > >> Jiangjie (Becket) Qin > >> > >> > >> > >> On Tue, Mar 24, 2020 at 2:03 AM Dawid Wysakowicz < > dwysakow...@apache.org> > >> wrote: > >> > >>> Hi Timo, > >>> > >>> Thank you for the proposal. I think it is an important improvement that > >>> will benefit many parts of the Table API. The proposal looks really > good > >>> to me and personally I would be comfortable with voting on the current > >>> state. > >>> > >>> Best, > >>> > >>> Dawid > >>> > >>> On 23/03/2020 18:53, Timo Walther wrote: > >>>> Hi everyone, > >>>> > >>>> I received some questions around how the new interfaces play together > >>>> with formats and their factories. > >>>> > >>>> Furthermore, for MySQL or Postgres CDC logs, the format should be able > >>>> to return a `ChangelogMode`. > >>>> > >>>> Also, I incorporated the feedback around the factory design in > general. > >>>> > >>>> I added a new section `Factory Interfaces` to the design document. > >>>> This should be helpful to understand the big picture and connecting > >>>> the concepts. > >>>> > >>>> Please let me know what you think? > >>>> > >>>> Thanks, > >>>> Timo > >>>> > >>>> > >>>> On 18.03.20 13:43, Timo Walther wrote: > >>>>> Hi Benchao, > >>>>> > >>>>> this is a very good question. I will update the FLIP about this. > >>>>> > >>>>> The legacy planner will not support the new interfaces. It will only > >>>>> support the old interfaces. With the next release, I think the Blink > >>>>> planner is stable enough to be the default one as well. > >>>>> > >>>>> Regards, > >>>>> Timo > >>>>> > >>>>> On 18.03.20 08:45, Benchao Li wrote: > >>>>>> Hi Timo, > >>>>>> > >>>>>> Thank you and others for the efforts to prepare this FLIP. > >>>>>> > >>>>>> The FLIP LGTM generally. > >>>>>> > >>>>>> +1 for moving blink data structures to table-common, it's useful to > >>>>>> udf too > >>>>>> in the future. > >>>>>> A little question is, do we plan to support the new interfaces and > data > >>>>>> types in legacy planner? > >>>>>> Or we only plan to support these new interfaces in blink planner. > >>>>>> > >>>>>> And using primary keys from DDL instead of derived key information > from > >>>>>> each query is also a good idea, > >>>>>> we met some use cases where this does not works very well before. > >>>>>> > >>>>>> This FLIP also makes the dependencies of table modules more clear, I > >>>>>> like > >>>>>> it very much. > >>>>>> > >>>>>> Timo Walther <twal...@apache.org> 于2020年3月17日周二 上午1:36写道: > >>>>>> > >>>>>>> Hi everyone, > >>>>>>> > >>>>>>> I'm happy to present the results of long discussions that we had > >>>>>>> internally. Jark, Dawid, Aljoscha, Kurt, Jingsong, me, and many > more > >>>>>>> have contributed to this design document. > >>>>>>> > >>>>>>> We would like to propose new long-term table source and table sink > >>>>>>> interfaces: > >>>>>>> > >>>>>>> > >>>>>>> > >>> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-95%3A+New+TableSource+and+TableSink+interfaces > >>>>>>> > >>>>>>> This is a requirement for FLIP-105 and finalizing FLIP-32. > >>>>>>> > >>>>>>> The goals of this FLIP are: > >>>>>>> > >>>>>>> - Simplify the current interface architecture: > >>>>>>> - Merge upsert, retract, and append sinks. > >>>>>>> - Unify batch and streaming sources. > >>>>>>> - Unify batch and streaming sinks. > >>>>>>> > >>>>>>> - Allow sources to produce a changelog: > >>>>>>> - UpsertTableSources have been requested a lot by users. Now > >>>>>>> is the > >>>>>>> time to open the internal planner capabilities via the new > interfaces. > >>>>>>> - According to FLIP-105, we would like to support > changelogs for > >>>>>>> processing formats such as Debezium. > >>>>>>> > >>>>>>> - Don't rely on DataStream API for source and sinks: > >>>>>>> - According to FLIP-32, the Table API and SQL should be > >>>>>>> independent > >>>>>>> of the DataStream API which is why the `table-common` module has no > >>>>>>> dependencies on `flink-streaming-java`. > >>>>>>> - Source and sink implementations should only depend on the > >>>>>>> `table-common` module after FLIP-27. > >>>>>>> - Until FLIP-27 is ready, we still put most of the > interfaces in > >>>>>>> `table-common` and strictly separate interfaces that communicate > >>>>>>> with a > >>>>>>> planner and actual runtime reader/writers. > >>>>>>> > >>>>>>> - Implement efficient sources and sinks without planner > dependencies: > >>>>>>> - Make Blink's internal data structures available to > connectors. > >>>>>>> - Introduce stable interfaces for data structures that can > be > >>>>>>> marked as `@PublicEvolving`. > >>>>>>> - Only require dependencies on `flink-table-common` in the > >>>>>>> future > >>>>>>> > >>>>>>> It finalizes the concept of dynamic tables and consideres how all > >>>>>>> source/sink related classes play together. > >>>>>>> > >>>>>>> We look forward to your feedback. > >>>>>>> > >>>>>>> Regards, > >>>>>>> Timo > >>>>>>> > >>>>>> > >>> > > > > -- Best, Jingsong Lee