Re: [Flink CDC] What's the difference between Pipeline connectors and Flink Source connectors?

Shengkai Fang Mon, 02 Dec 2024 22:53:52 -0800

As far as I know, Flink pipeline connector has the following benefits:

1. User-friendly:
  * Schema inference: you don't need write schema in the yaml file, the
framework will convert the data type for users.
  * Yaml is much easier for users to use comparing to SQL. Many external
system can use yaml to build a


2. Enterprise-level features：
   * Schema evolution: if the upstream table add a new column, yaml job
supports to update the downstream table's schema.
   * Full DB sync: you can use a job to sync all tables in the upstream
database to downstream. In SQL, you needs write multiple statements to sync
every tables.

Best,
Shengkai


Andrew Otto <o...@wikimedia.org> 于2024年12月3日周二 02:13写道：

> Hi Robin!
>
> IIUC, the difference is:
>
>
>    - Pipeline connectors can be used as a fully contained yaml configured
>    CDC pipeline job
>    
> <https://nightlies.apache.org/flink/flink-cdc-docs-release-3.2/docs/core-concept/data-pipeline/>
>    - Flink CDC sources are Flink Table connectors that can connect
>    directly to source database tables and binlogs.  They allow you to use
>    Flink SQL / Table API to query external source databases.  They are used
>    internally by pipelines.  E.g. The mysql-cdc connector is used by a source
>    type: mysql pipeline connector.
>
>
> > is the point that Flink CDC provides CDC connectors, and they are
> documented here
> <https://nightlies.apache.org/flink/flink-cdc-docs-release-3.2/docs/connectors/flink-sources/overview/>
>  when
> they could as logically be documented here
> <https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/overview/>
>  under
> the main Flink docs?
>
> Flink CDC connectors are Flink Table connectors, but specifically for
> doing CDC.  Compare that to e.g. the Flink JDBC table connector
> <https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/jdbc/>,
> which allows you to query a MySQL table with Flink, but won't read changes
> in a streaming fashion.  (IIUC, that is why the JDBC docs have a "Scan
> Source: Bounded" heading)
>
> I'm not an expert though, so please someone correct me if I am wrong!
>
>
>
>
>
> On Mon, Dec 2, 2024 at 12:52 PM Robin Moffatt via user <
> user@flink.apache.org> wrote:
>
>> I'm struggling to grok the difference between pipeline connectors
>> <https://nightlies.apache.org/flink/flink-cdc-docs-release-3.2/docs/connectors/pipeline-connectors/overview/>
>> and Flink sources
>> <https://nightlies.apache.org/flink/flink-cdc-docs-release-3.2/docs/connectors/flink-sources/overview/>
>>  in
>> Flink CDC.
>>
>> I understand pipeline connectors, and have been through the quickstart
>> and they make sense.
>>
>> But how are Flink sources any different from what I'd build in Flink SQL
>> itself directly? How do they fit into Flink CDC? Or is the point that Flink
>> CDC provides CDC connectors, and they are documented here
>> <https://nightlies.apache.org/flink/flink-cdc-docs-release-3.2/docs/connectors/flink-sources/overview/>
>> when they could as logically be documented here
>> <https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/overview/>
>>  under
>> the main Flink docs?
>>
>> Thanks in advance,
>> Robin
>>
>

Re: [Flink CDC] What's the difference between Pipeline connectors and Flink Source connectors?

Reply via email to