Re: [Flink CDC] What's the difference between Pipeline connectors and Flink Source connectors?

Shengkai Fang Tue, 03 Dec 2024 00:30:52 -0800

Accidentally sent an email that was not finished...

Yaml is much easier for users to use compared to SQL. Many external systems
can use yaml spec to build a data pipeline platform easily.


Best,
Shengkai



Shengkai Fang <fskm...@gmail.com> 于2024年12月3日周二 14:53写道：

> As far as I know, Flink pipeline connector has the following benefits:
>
> 1. User-friendly:
>   * Schema inference: you don't need write schema in the yaml file, the
> framework will convert the data type for users.
>   * Yaml is much easier for users to use comparing to SQL. Many external
> system can use yaml to build a
>
> 2. Enterprise-level features：
>    * Schema evolution: if the upstream table add a new column, yaml job
> supports to update the downstream table's schema.
>    * Full DB sync: you can use a job to sync all tables in the upstream
> database to downstream. In SQL, you needs write multiple statements to sync
> every tables.
>
> Best,
> Shengkai
>
>
> Andrew Otto <o...@wikimedia.org> 于2024年12月3日周二 02:13写道：
>
>> Hi Robin!
>>
>> IIUC, the difference is:
>>
>>
>>    - Pipeline connectors can be used as a fully contained yaml
>>    configured CDC pipeline job
>>    
>> <https://nightlies.apache.org/flink/flink-cdc-docs-release-3.2/docs/core-concept/data-pipeline/>
>>    - Flink CDC sources are Flink Table connectors that can connect
>>    directly to source database tables and binlogs.  They allow you to use
>>    Flink SQL / Table API to query external source databases.  They are used
>>    internally by pipelines.  E.g. The mysql-cdc connector is used by a source
>>    type: mysql pipeline connector.
>>
>>
>> > is the point that Flink CDC provides CDC connectors, and they are
>> documented here
>> <https://nightlies.apache.org/flink/flink-cdc-docs-release-3.2/docs/connectors/flink-sources/overview/>
>>  when
>> they could as logically be documented here
>> <https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/overview/>
>>  under
>> the main Flink docs?
>>
>> Flink CDC connectors are Flink Table connectors, but specifically for
>> doing CDC.  Compare that to e.g. the Flink JDBC table connector
>> <https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/jdbc/>,
>> which allows you to query a MySQL table with Flink, but won't read changes
>> in a streaming fashion.  (IIUC, that is why the JDBC docs have a "Scan
>> Source: Bounded" heading)
>>
>> I'm not an expert though, so please someone correct me if I am wrong!
>>
>>
>>
>>
>>
>> On Mon, Dec 2, 2024 at 12:52 PM Robin Moffatt via user <
>> user@flink.apache.org> wrote:
>>
>>> I'm struggling to grok the difference between pipeline connectors
>>> <https://nightlies.apache.org/flink/flink-cdc-docs-release-3.2/docs/connectors/pipeline-connectors/overview/>
>>> and Flink sources
>>> <https://nightlies.apache.org/flink/flink-cdc-docs-release-3.2/docs/connectors/flink-sources/overview/>
>>>  in
>>> Flink CDC.
>>>
>>> I understand pipeline connectors, and have been through the quickstart
>>> and they make sense.
>>>
>>> But how are Flink sources any different from what I'd build in Flink SQL
>>> itself directly? How do they fit into Flink CDC? Or is the point that Flink
>>> CDC provides CDC connectors, and they are documented here
>>> <https://nightlies.apache.org/flink/flink-cdc-docs-release-3.2/docs/connectors/flink-sources/overview/>
>>> when they could as logically be documented here
>>> <https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/overview/>
>>>  under
>>> the main Flink docs?
>>>
>>> Thanks in advance,
>>> Robin
>>>
>>

Re: [Flink CDC] What's the difference between Pipeline connectors and Flink Source connectors?

Reply via email to