Thanks Shengkai and Andrew - that's helped clarify things a lot.



On Tue, 3 Dec 2024 at 08:30, Shengkai Fang <fskm...@gmail.com> wrote:

> Accidentally sent an email that was not finished...
>
> Yaml is much easier for users to use compared to SQL. Many external
> systems can use yaml spec to build a data pipeline platform easily.
>
> Best,
> Shengkai
>
>
>
> Shengkai Fang <fskm...@gmail.com> 于2024年12月3日周二 14:53写道:
>
>> As far as I know, Flink pipeline connector has the following benefits:
>>
>> 1. User-friendly:
>>   * Schema inference: you don't need write schema in the yaml file, the
>> framework will convert the data type for users.
>>   * Yaml is much easier for users to use comparing to SQL. Many external
>> system can use yaml to build a
>>
>> 2. Enterprise-level features:
>>    * Schema evolution: if the upstream table add a new column, yaml job
>> supports to update the downstream table's schema.
>>    * Full DB sync: you can use a job to sync all tables in the upstream
>> database to downstream. In SQL, you needs write multiple statements to sync
>> every tables.
>>
>> Best,
>> Shengkai
>>
>>
>> Andrew Otto <o...@wikimedia.org> 于2024年12月3日周二 02:13写道:
>>
>>> Hi Robin!
>>>
>>> IIUC, the difference is:
>>>
>>>
>>>    - Pipeline connectors can be used as a fully contained yaml
>>>    configured CDC pipeline job
>>>    
>>> <https://nightlies.apache.org/flink/flink-cdc-docs-release-3.2/docs/core-concept/data-pipeline/>
>>>    - Flink CDC sources are Flink Table connectors that can connect
>>>    directly to source database tables and binlogs.  They allow you to use
>>>    Flink SQL / Table API to query external source databases.  They are used
>>>    internally by pipelines.  E.g. The mysql-cdc connector is used by a 
>>> source
>>>    type: mysql pipeline connector.
>>>
>>>
>>> > is the point that Flink CDC provides CDC connectors, and they are
>>> documented here
>>> <https://nightlies.apache.org/flink/flink-cdc-docs-release-3.2/docs/connectors/flink-sources/overview/>
>>>  when
>>> they could as logically be documented here
>>> <https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/overview/>
>>>  under
>>> the main Flink docs?
>>>
>>> Flink CDC connectors are Flink Table connectors, but specifically for
>>> doing CDC.  Compare that to e.g. the Flink JDBC table connector
>>> <https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/jdbc/>,
>>> which allows you to query a MySQL table with Flink, but won't read changes
>>> in a streaming fashion.  (IIUC, that is why the JDBC docs have a "Scan
>>> Source: Bounded" heading)
>>>
>>> I'm not an expert though, so please someone correct me if I am wrong!
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Dec 2, 2024 at 12:52 PM Robin Moffatt via user <
>>> user@flink.apache.org> wrote:
>>>
>>>> I'm struggling to grok the difference between pipeline connectors
>>>> <https://nightlies.apache.org/flink/flink-cdc-docs-release-3.2/docs/connectors/pipeline-connectors/overview/>
>>>> and Flink sources
>>>> <https://nightlies.apache.org/flink/flink-cdc-docs-release-3.2/docs/connectors/flink-sources/overview/>
>>>>  in
>>>> Flink CDC.
>>>>
>>>> I understand pipeline connectors, and have been through the quickstart
>>>> and they make sense.
>>>>
>>>> But how are Flink sources any different from what I'd build in Flink
>>>> SQL itself directly? How do they fit into Flink CDC? Or is the point that
>>>> Flink CDC provides CDC connectors, and they are documented here
>>>> <https://nightlies.apache.org/flink/flink-cdc-docs-release-3.2/docs/connectors/flink-sources/overview/>
>>>> when they could as logically be documented here
>>>> <https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/overview/>
>>>>  under
>>>> the main Flink docs?
>>>>
>>>> Thanks in advance,
>>>> Robin
>>>>
>>>

Reply via email to