Re: [DISCUSS] FLIP-95: New TableSource and TableSink interfaces

Dawid Wysakowicz Tue, 24 Mar 2020 01:20:46 -0700

Hi Becket,

Answering your question, we have the same intention not to duplicate
connectors between datastream and table apis. The interfaces proposed in
the FLIP are a way to describe relational properties of a source. The
intention is as you described to translate all of those expressed as
expressions or other Table specific structures into a DataStream source.
In other words I think what we are doing here is in line with what you
described.


Best,

Dawid

On 24/03/2020 02:23, Becket Qin wrote:
> Hi Timo,
>
> Thanks for the proposal. I completely agree that the current Table
> connectors could be simplified quite a bit. I haven't finished reading
> everything, but here are some quick thoughts.
>
> Actually to me the biggest question is why should there be two different
> connector systems for DataStream and Table? What is the fundamental reason
> that is preventing us from merging them to one?
>
> The basic functionality of a connector is to provide capabilities to do IO
> and Serde. Conceptually, Table connectors should just be DataStream
> connectors that are dealing with Rows. It seems that quite a few of the
> special connector requirements are just a specific way to do IO / Serde.
> Taking SupportsFilterPushDown as an example, imagine we have the following
> interface:
>
> interface FilterableSource<PREDICATE> {
>     void applyFilterable(Supplier<PREDICATE> predicate);
> }
>
> And if a ParquetSource would like to support filterable, it will become:
>
> class ParquetSource implements Source, FilterableSource(FilterPredicate> {
>     ...
> }
>
> For Table, one just need to provide an predicate supplier that converts an
> Expression to the specified predicate type. This has a few benefit:
> 1. Same unified API for filterable for sources, regardless of DataStream or
> Table.
> 2. The  DataStream users now can also use the ExpressionToPredicate
> supplier if they want to.
>
> To summarize, my main point is that I am wondering if it is possible to
> have a single set of connector interface for both Table and DataStream,
> rather than having two hierarchies. I am not 100% sure if this would work,
> but if it works, this would be a huge win from both code maintenance and
> user experience perspective.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
>
>
> On Tue, Mar 24, 2020 at 2:03 AM Dawid Wysakowicz <dwysakow...@apache.org>
> wrote:
>
>> Hi Timo,
>>
>> Thank you for the proposal. I think it is an important improvement that
>> will benefit many parts of the Table API. The proposal looks really good
>> to me and personally I would be comfortable with voting on the current
>> state.
>>
>> Best,
>>
>> Dawid
>>
>> On 23/03/2020 18:53, Timo Walther wrote:
>>> Hi everyone,
>>>
>>> I received some questions around how the new interfaces play together
>>> with formats and their factories.
>>>
>>> Furthermore, for MySQL or Postgres CDC logs, the format should be able
>>> to return a `ChangelogMode`.
>>>
>>> Also, I incorporated the feedback around the factory design in general.
>>>
>>> I added a new section `Factory Interfaces` to the design document.
>>> This should be helpful to understand the big picture and connecting
>>> the concepts.
>>>
>>> Please let me know what you think?
>>>
>>> Thanks,
>>> Timo
>>>
>>>
>>> On 18.03.20 13:43, Timo Walther wrote:
>>>> Hi Benchao,
>>>>
>>>> this is a very good question. I will update the FLIP about this.
>>>>
>>>> The legacy planner will not support the new interfaces. It will only
>>>> support the old interfaces. With the next release, I think the Blink
>>>> planner is stable enough to be the default one as well.
>>>>
>>>> Regards,
>>>> Timo
>>>>
>>>> On 18.03.20 08:45, Benchao Li wrote:
>>>>> Hi Timo,
>>>>>
>>>>> Thank you and others for the efforts to prepare this FLIP.
>>>>>
>>>>> The FLIP LGTM generally.
>>>>>
>>>>> +1 for moving blink data structures to table-common, it's useful to
>>>>> udf too
>>>>> in the future.
>>>>> A little question is, do we plan to support the new interfaces and data
>>>>> types in legacy planner?
>>>>> Or we only plan to support these new interfaces in blink planner.
>>>>>
>>>>> And using primary keys from DDL instead of derived key information from
>>>>> each query is also a good idea,
>>>>> we met some use cases where this does not works very well before.
>>>>>
>>>>> This FLIP also makes the dependencies of table modules more clear, I
>>>>> like
>>>>> it very much.
>>>>>
>>>>> Timo Walther <twal...@apache.org> 于2020年3月17日周二 上午1:36写道：
>>>>>
>>>>>> Hi everyone,
>>>>>>
>>>>>> I'm happy to present the results of long discussions that we had
>>>>>> internally. Jark, Dawid, Aljoscha, Kurt, Jingsong, me, and many more
>>>>>> have contributed to this design document.
>>>>>>
>>>>>> We would like to propose new long-term table source and table sink
>>>>>> interfaces:
>>>>>>
>>>>>>
>>>>>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-95%3A+New+TableSource+and+TableSink+interfaces
>>>>>>
>>>>>> This is a requirement for FLIP-105 and finalizing FLIP-32.
>>>>>>
>>>>>> The goals of this FLIP are:
>>>>>>
>>>>>> - Simplify the current interface architecture:
>>>>>>       - Merge upsert, retract, and append sinks.
>>>>>>       - Unify batch and streaming sources.
>>>>>>       - Unify batch and streaming sinks.
>>>>>>
>>>>>> - Allow sources to produce a changelog:
>>>>>>       - UpsertTableSources have been requested a lot by users. Now
>>>>>> is the
>>>>>> time to open the internal planner capabilities via the new interfaces.
>>>>>>       - According to FLIP-105, we would like to support changelogs for
>>>>>> processing formats such as Debezium.
>>>>>>
>>>>>> - Don't rely on DataStream API for source and sinks:
>>>>>>       - According to FLIP-32, the Table API and SQL should be
>>>>>> independent
>>>>>> of the DataStream API which is why the `table-common` module has no
>>>>>> dependencies on `flink-streaming-java`.
>>>>>>       - Source and sink implementations should only depend on the
>>>>>> `table-common` module after FLIP-27.
>>>>>>       - Until FLIP-27 is ready, we still put most of the interfaces in
>>>>>> `table-common` and strictly separate interfaces that communicate
>>>>>> with a
>>>>>> planner and actual runtime reader/writers.
>>>>>>
>>>>>> - Implement efficient sources and sinks without planner dependencies:
>>>>>>       - Make Blink's internal data structures available to connectors.
>>>>>>       - Introduce stable interfaces for data structures that can be
>>>>>> marked as `@PublicEvolving`.
>>>>>>       - Only require dependencies on `flink-table-common` in the
>>>>>> future
>>>>>>
>>>>>> It finalizes the concept of dynamic tables and consideres how all
>>>>>> source/sink related classes play together.
>>>>>>
>>>>>> We look forward to your feedback.
>>>>>>
>>>>>> Regards,
>>>>>> Timo
>>>>>>
>>>>>
>>

signature.asc
Description: OpenPGP digital signature

Re: [DISCUSS] FLIP-95: New TableSource and TableSink interfaces

Reply via email to