Accidentally sent an email that was not finished... Yaml is much easier for users to use compared to SQL. Many external systems can use yaml spec to build a data pipeline platform easily.
Best, Shengkai Shengkai Fang <fskm...@gmail.com> 于2024年12月3日周二 14:53写道: > As far as I know, Flink pipeline connector has the following benefits: > > 1. User-friendly: > * Schema inference: you don't need write schema in the yaml file, the > framework will convert the data type for users. > * Yaml is much easier for users to use comparing to SQL. Many external > system can use yaml to build a > > 2. Enterprise-level features: > * Schema evolution: if the upstream table add a new column, yaml job > supports to update the downstream table's schema. > * Full DB sync: you can use a job to sync all tables in the upstream > database to downstream. In SQL, you needs write multiple statements to sync > every tables. > > Best, > Shengkai > > > Andrew Otto <o...@wikimedia.org> 于2024年12月3日周二 02:13写道: > >> Hi Robin! >> >> IIUC, the difference is: >> >> >> - Pipeline connectors can be used as a fully contained yaml >> configured CDC pipeline job >> >> <https://nightlies.apache.org/flink/flink-cdc-docs-release-3.2/docs/core-concept/data-pipeline/> >> - Flink CDC sources are Flink Table connectors that can connect >> directly to source database tables and binlogs. They allow you to use >> Flink SQL / Table API to query external source databases. They are used >> internally by pipelines. E.g. The mysql-cdc connector is used by a source >> type: mysql pipeline connector. >> >> >> > is the point that Flink CDC provides CDC connectors, and they are >> documented here >> <https://nightlies.apache.org/flink/flink-cdc-docs-release-3.2/docs/connectors/flink-sources/overview/> >> when >> they could as logically be documented here >> <https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/overview/> >> under >> the main Flink docs? >> >> Flink CDC connectors are Flink Table connectors, but specifically for >> doing CDC. Compare that to e.g. the Flink JDBC table connector >> <https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/jdbc/>, >> which allows you to query a MySQL table with Flink, but won't read changes >> in a streaming fashion. (IIUC, that is why the JDBC docs have a "Scan >> Source: Bounded" heading) >> >> I'm not an expert though, so please someone correct me if I am wrong! >> >> >> >> >> >> On Mon, Dec 2, 2024 at 12:52 PM Robin Moffatt via user < >> user@flink.apache.org> wrote: >> >>> I'm struggling to grok the difference between pipeline connectors >>> <https://nightlies.apache.org/flink/flink-cdc-docs-release-3.2/docs/connectors/pipeline-connectors/overview/> >>> and Flink sources >>> <https://nightlies.apache.org/flink/flink-cdc-docs-release-3.2/docs/connectors/flink-sources/overview/> >>> in >>> Flink CDC. >>> >>> I understand pipeline connectors, and have been through the quickstart >>> and they make sense. >>> >>> But how are Flink sources any different from what I'd build in Flink SQL >>> itself directly? How do they fit into Flink CDC? Or is the point that Flink >>> CDC provides CDC connectors, and they are documented here >>> <https://nightlies.apache.org/flink/flink-cdc-docs-release-3.2/docs/connectors/flink-sources/overview/> >>> when they could as logically be documented here >>> <https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/overview/> >>> under >>> the main Flink docs? >>> >>> Thanks in advance, >>> Robin >>> >>