Hi Becket, I don't think DataStream should see some SQL specific concepts such as Filtering or ComputedColumn. It's better to stay within SQL area and translate to more generic concept when translating to DataStream/Runtime layer, such as use MapFunction to represent computed column logic.
Best, Kurt On Tue, Mar 24, 2020 at 5:47 PM Becket Qin <becket....@gmail.com> wrote: > Hi Timo and Dawid, > > It's really great that we have the same goal. I am actually wondering if we > can go one step further to avoid some of the interfaces in Table as well. > > For example, if we have the FilterableSource, do we still need the > FilterableTableSource? Should DynamicTableSource just become a > Source<*Row*, > SourceSplitT, EnumChkT>? > > Can you help me understand a bit more about the reason we need the > following relational representation / wrapper interfaces v.s. the > interfaces that we could put to the Source in FLIP-27? > > DynamicTableSource v.s. Source<Row, SourceSplitT, EnumChkT> > SupportsFilterablePushDown v.s. FilterableSource > SupportsProjectablePushDown v.s. ProjectableSource > SupportsWatermarkPushDown v.s. WithWatermarkAssigner > SupportsComputedColumnPushDown v.s. ComputedColumnDeserializer > ScanTableSource v.s. ChangeLogDeserializer. > LookUpTableSource v.s. LookUpSource > > Assuming we have all the interfaces on the right side, do we still need the > interfaces on the left side? Note that the interfaces on the right can be > used by both DataStream and Table. If we do this, there will only be one > set of Source interfaces Table and DataStream, the only difference is that > the Source for table will have some specific plugins and configurations. An > omnipotent Source can implement all the the above interfaces and take a > Deserializer that implements both ComputedColumnDeserializer and > ChangeLogDeserializer. > > Would the SQL planner work with that? > > Thanks, > > Jiangjie (Becket) Qin > > > > > > > On Tue, Mar 24, 2020 at 5:03 PM Jingsong Li <jingsongl...@gmail.com> > wrote: > > > +1. Thanks Timo for the design doc. > > > > We can also consider @Experimental too. But I am +1 to @PublicEvolving, > we > > should be confident in the current change. > > > > Best, > > Jingsong Lee > > > > On Tue, Mar 24, 2020 at 4:30 PM Timo Walther <twal...@apache.org> wrote: > > > > > @Becket: We totally agree that we don't need table specific connectors > > > during runtime. As Dawid said, the interfaces proposed here are just > for > > > communication with the planner. Once the properties (watermarks, > > > computed column, filters, projecttion etc.) are negotiated, we can > > > configure a regular Flink connector. > > > > > > E.g. setting the watermark assigner and deserialization schema of a > > > Kafka connector. > > > > > > For better separation of concerns, Flink connectors should not include > > > relational interfaces and depend on flink-table. This is the > > > responsibility of table source/sink. > > > > > > @Kurt: I would like to mark them @PublicEvolving already because we > need > > > to deprecate the old interfaces as early as possible. We cannot > redirect > > > to @Internal interfaces. They are not marked @Public, so we can still > > > evolve them. But a core design shift should not happen again, it would > > > leave a bad impression if we are redesign over and over again. Instead > > > we should be confident in the current change. > > > > > > Regards, > > > Timo > > > > > > > > > On 24.03.20 09:20, Dawid Wysakowicz wrote: > > > > Hi Becket, > > > > > > > > Answering your question, we have the same intention not to duplicate > > > > connectors between datastream and table apis. The interfaces proposed > > in > > > > the FLIP are a way to describe relational properties of a source. The > > > > intention is as you described to translate all of those expressed as > > > > expressions or other Table specific structures into a DataStream > > source. > > > > In other words I think what we are doing here is in line with what > you > > > > described. > > > > > > > > Best, > > > > > > > > Dawid > > > > > > > > On 24/03/2020 02:23, Becket Qin wrote: > > > >> Hi Timo, > > > >> > > > >> Thanks for the proposal. I completely agree that the current Table > > > >> connectors could be simplified quite a bit. I haven't finished > reading > > > >> everything, but here are some quick thoughts. > > > >> > > > >> Actually to me the biggest question is why should there be two > > different > > > >> connector systems for DataStream and Table? What is the fundamental > > > reason > > > >> that is preventing us from merging them to one? > > > >> > > > >> The basic functionality of a connector is to provide capabilities to > > do > > > IO > > > >> and Serde. Conceptually, Table connectors should just be DataStream > > > >> connectors that are dealing with Rows. It seems that quite a few of > > the > > > >> special connector requirements are just a specific way to do IO / > > Serde. > > > >> Taking SupportsFilterPushDown as an example, imagine we have the > > > following > > > >> interface: > > > >> > > > >> interface FilterableSource<PREDICATE> { > > > >> void applyFilterable(Supplier<PREDICATE> predicate); > > > >> } > > > >> > > > >> And if a ParquetSource would like to support filterable, it will > > become: > > > >> > > > >> class ParquetSource implements Source, > > > FilterableSource(FilterPredicate> { > > > >> ... > > > >> } > > > >> > > > >> For Table, one just need to provide an predicate supplier that > > converts > > > an > > > >> Expression to the specified predicate type. This has a few benefit: > > > >> 1. Same unified API for filterable for sources, regardless of > > > DataStream or > > > >> Table. > > > >> 2. The DataStream users now can also use the ExpressionToPredicate > > > >> supplier if they want to. > > > >> > > > >> To summarize, my main point is that I am wondering if it is possible > > to > > > >> have a single set of connector interface for both Table and > > DataStream, > > > >> rather than having two hierarchies. I am not 100% sure if this would > > > work, > > > >> but if it works, this would be a huge win from both code maintenance > > and > > > >> user experience perspective. > > > >> > > > >> Thanks, > > > >> > > > >> Jiangjie (Becket) Qin > > > >> > > > >> > > > >> > > > >> On Tue, Mar 24, 2020 at 2:03 AM Dawid Wysakowicz < > > > dwysakow...@apache.org> > > > >> wrote: > > > >> > > > >>> Hi Timo, > > > >>> > > > >>> Thank you for the proposal. I think it is an important improvement > > that > > > >>> will benefit many parts of the Table API. The proposal looks really > > > good > > > >>> to me and personally I would be comfortable with voting on the > > current > > > >>> state. > > > >>> > > > >>> Best, > > > >>> > > > >>> Dawid > > > >>> > > > >>> On 23/03/2020 18:53, Timo Walther wrote: > > > >>>> Hi everyone, > > > >>>> > > > >>>> I received some questions around how the new interfaces play > > together > > > >>>> with formats and their factories. > > > >>>> > > > >>>> Furthermore, for MySQL or Postgres CDC logs, the format should be > > able > > > >>>> to return a `ChangelogMode`. > > > >>>> > > > >>>> Also, I incorporated the feedback around the factory design in > > > general. > > > >>>> > > > >>>> I added a new section `Factory Interfaces` to the design document. > > > >>>> This should be helpful to understand the big picture and > connecting > > > >>>> the concepts. > > > >>>> > > > >>>> Please let me know what you think? > > > >>>> > > > >>>> Thanks, > > > >>>> Timo > > > >>>> > > > >>>> > > > >>>> On 18.03.20 13:43, Timo Walther wrote: > > > >>>>> Hi Benchao, > > > >>>>> > > > >>>>> this is a very good question. I will update the FLIP about this. > > > >>>>> > > > >>>>> The legacy planner will not support the new interfaces. It will > > only > > > >>>>> support the old interfaces. With the next release, I think the > > Blink > > > >>>>> planner is stable enough to be the default one as well. > > > >>>>> > > > >>>>> Regards, > > > >>>>> Timo > > > >>>>> > > > >>>>> On 18.03.20 08:45, Benchao Li wrote: > > > >>>>>> Hi Timo, > > > >>>>>> > > > >>>>>> Thank you and others for the efforts to prepare this FLIP. > > > >>>>>> > > > >>>>>> The FLIP LGTM generally. > > > >>>>>> > > > >>>>>> +1 for moving blink data structures to table-common, it's useful > > to > > > >>>>>> udf too > > > >>>>>> in the future. > > > >>>>>> A little question is, do we plan to support the new interfaces > and > > > data > > > >>>>>> types in legacy planner? > > > >>>>>> Or we only plan to support these new interfaces in blink > planner. > > > >>>>>> > > > >>>>>> And using primary keys from DDL instead of derived key > information > > > from > > > >>>>>> each query is also a good idea, > > > >>>>>> we met some use cases where this does not works very well > before. > > > >>>>>> > > > >>>>>> This FLIP also makes the dependencies of table modules more > > clear, I > > > >>>>>> like > > > >>>>>> it very much. > > > >>>>>> > > > >>>>>> Timo Walther <twal...@apache.org> 于2020年3月17日周二 上午1:36写道: > > > >>>>>> > > > >>>>>>> Hi everyone, > > > >>>>>>> > > > >>>>>>> I'm happy to present the results of long discussions that we > had > > > >>>>>>> internally. Jark, Dawid, Aljoscha, Kurt, Jingsong, me, and many > > > more > > > >>>>>>> have contributed to this design document. > > > >>>>>>> > > > >>>>>>> We would like to propose new long-term table source and table > > sink > > > >>>>>>> interfaces: > > > >>>>>>> > > > >>>>>>> > > > >>>>>>> > > > >>> > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-95%3A+New+TableSource+and+TableSink+interfaces > > > >>>>>>> > > > >>>>>>> This is a requirement for FLIP-105 and finalizing FLIP-32. > > > >>>>>>> > > > >>>>>>> The goals of this FLIP are: > > > >>>>>>> > > > >>>>>>> - Simplify the current interface architecture: > > > >>>>>>> - Merge upsert, retract, and append sinks. > > > >>>>>>> - Unify batch and streaming sources. > > > >>>>>>> - Unify batch and streaming sinks. > > > >>>>>>> > > > >>>>>>> - Allow sources to produce a changelog: > > > >>>>>>> - UpsertTableSources have been requested a lot by users. > > Now > > > >>>>>>> is the > > > >>>>>>> time to open the internal planner capabilities via the new > > > interfaces. > > > >>>>>>> - According to FLIP-105, we would like to support > > > changelogs for > > > >>>>>>> processing formats such as Debezium. > > > >>>>>>> > > > >>>>>>> - Don't rely on DataStream API for source and sinks: > > > >>>>>>> - According to FLIP-32, the Table API and SQL should be > > > >>>>>>> independent > > > >>>>>>> of the DataStream API which is why the `table-common` module > has > > no > > > >>>>>>> dependencies on `flink-streaming-java`. > > > >>>>>>> - Source and sink implementations should only depend on > > the > > > >>>>>>> `table-common` module after FLIP-27. > > > >>>>>>> - Until FLIP-27 is ready, we still put most of the > > > interfaces in > > > >>>>>>> `table-common` and strictly separate interfaces that > communicate > > > >>>>>>> with a > > > >>>>>>> planner and actual runtime reader/writers. > > > >>>>>>> > > > >>>>>>> - Implement efficient sources and sinks without planner > > > dependencies: > > > >>>>>>> - Make Blink's internal data structures available to > > > connectors. > > > >>>>>>> - Introduce stable interfaces for data structures that > can > > > be > > > >>>>>>> marked as `@PublicEvolving`. > > > >>>>>>> - Only require dependencies on `flink-table-common` in > the > > > >>>>>>> future > > > >>>>>>> > > > >>>>>>> It finalizes the concept of dynamic tables and consideres how > all > > > >>>>>>> source/sink related classes play together. > > > >>>>>>> > > > >>>>>>> We look forward to your feedback. > > > >>>>>>> > > > >>>>>>> Regards, > > > >>>>>>> Timo > > > >>>>>>> > > > >>>>>> > > > >>> > > > > > > > > > > > > > > -- > > Best, Jingsong Lee > > >