Thanks Shengkai and Andrew - that's helped clarify things a lot.
On Tue, 3 Dec 2024 at 08:30, Shengkai Fang <fskm...@gmail.com> wrote: > Accidentally sent an email that was not finished... > > Yaml is much easier for users to use compared to SQL. Many external > systems can use yaml spec to build a data pipeline platform easily. > > Best, > Shengkai > > > > Shengkai Fang <fskm...@gmail.com> 于2024年12月3日周二 14:53写道: > >> As far as I know, Flink pipeline connector has the following benefits: >> >> 1. User-friendly: >> * Schema inference: you don't need write schema in the yaml file, the >> framework will convert the data type for users. >> * Yaml is much easier for users to use comparing to SQL. Many external >> system can use yaml to build a >> >> 2. Enterprise-level features: >> * Schema evolution: if the upstream table add a new column, yaml job >> supports to update the downstream table's schema. >> * Full DB sync: you can use a job to sync all tables in the upstream >> database to downstream. In SQL, you needs write multiple statements to sync >> every tables. >> >> Best, >> Shengkai >> >> >> Andrew Otto <o...@wikimedia.org> 于2024年12月3日周二 02:13写道: >> >>> Hi Robin! >>> >>> IIUC, the difference is: >>> >>> >>> - Pipeline connectors can be used as a fully contained yaml >>> configured CDC pipeline job >>> >>> <https://nightlies.apache.org/flink/flink-cdc-docs-release-3.2/docs/core-concept/data-pipeline/> >>> - Flink CDC sources are Flink Table connectors that can connect >>> directly to source database tables and binlogs. They allow you to use >>> Flink SQL / Table API to query external source databases. They are used >>> internally by pipelines. E.g. The mysql-cdc connector is used by a >>> source >>> type: mysql pipeline connector. >>> >>> >>> > is the point that Flink CDC provides CDC connectors, and they are >>> documented here >>> <https://nightlies.apache.org/flink/flink-cdc-docs-release-3.2/docs/connectors/flink-sources/overview/> >>> when >>> they could as logically be documented here >>> <https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/overview/> >>> under >>> the main Flink docs? >>> >>> Flink CDC connectors are Flink Table connectors, but specifically for >>> doing CDC. Compare that to e.g. the Flink JDBC table connector >>> <https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/jdbc/>, >>> which allows you to query a MySQL table with Flink, but won't read changes >>> in a streaming fashion. (IIUC, that is why the JDBC docs have a "Scan >>> Source: Bounded" heading) >>> >>> I'm not an expert though, so please someone correct me if I am wrong! >>> >>> >>> >>> >>> >>> On Mon, Dec 2, 2024 at 12:52 PM Robin Moffatt via user < >>> user@flink.apache.org> wrote: >>> >>>> I'm struggling to grok the difference between pipeline connectors >>>> <https://nightlies.apache.org/flink/flink-cdc-docs-release-3.2/docs/connectors/pipeline-connectors/overview/> >>>> and Flink sources >>>> <https://nightlies.apache.org/flink/flink-cdc-docs-release-3.2/docs/connectors/flink-sources/overview/> >>>> in >>>> Flink CDC. >>>> >>>> I understand pipeline connectors, and have been through the quickstart >>>> and they make sense. >>>> >>>> But how are Flink sources any different from what I'd build in Flink >>>> SQL itself directly? How do they fit into Flink CDC? Or is the point that >>>> Flink CDC provides CDC connectors, and they are documented here >>>> <https://nightlies.apache.org/flink/flink-cdc-docs-release-3.2/docs/connectors/flink-sources/overview/> >>>> when they could as logically be documented here >>>> <https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/overview/> >>>> under >>>> the main Flink docs? >>>> >>>> Thanks in advance, >>>> Robin >>>> >>>