This is an automated email from the ASF dual-hosted git repository. luzhijing pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push: new f7f88f039b9 [doc]Flink connector document adds schema-change-mode parameter configuration (#891) f7f88f039b9 is described below commit f7f88f039b97eac54f61676da115496f401e49a7 Author: wudongliang <46414265+donglian...@users.noreply.github.com> AuthorDate: Thu Aug 1 17:14:32 2024 +0800 [doc]Flink connector document adds schema-change-mode parameter configuration (#891) --- docs/ecosystem/flink-doris-connector.md | 39 +++++++++++---------- .../current/ecosystem/flink-doris-connector.md | 39 +++++++++++---------- .../version-2.1/ecosystem/flink-doris-connector.md | 40 +++++++++++----------- .../version-3.0/ecosystem/flink-doris-connector.md | 3 +- .../version-2.1/ecosystem/flink-doris-connector.md | 39 +++++++++++---------- .../version-3.0/ecosystem/flink-doris-connector.md | 1 + 6 files changed, 83 insertions(+), 78 deletions(-) diff --git a/docs/ecosystem/flink-doris-connector.md b/docs/ecosystem/flink-doris-connector.md index 264bc8b1a4e..e50297e7899 100644 --- a/docs/ecosystem/flink-doris-connector.md +++ b/docs/ecosystem/flink-doris-connector.md @@ -526,26 +526,27 @@ insert into doris_sink select id,name,bank,age from cdc_mysql_source; [--table-conf <doris-table-conf> [--table-conf <doris-table-conf> ...]] ``` -| Key | Comment | -| ----------------------- | ------------------------------------------------------------ | -| --job-name | Flink task name, optional | -| --database | Database name synchronized to Doris | -| --table-prefix | Doris table prefix name, such as --table-prefix ods_. | -| --table-suffix | Same as above, the suffix name of the Doris table. | -| --including-tables | For MySQL tables that need to be synchronized, you can use "|" to separate multiple tables and support regular expressions. For example --including-tables table1 | -| --excluding-tables | For tables that do not need to be synchronized, the usage is the same as above. | +| Key | Comment [...] +| ----------------------- |--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- [...] +| --job-name | Flink task name, optional [...] +| --database | Database name synchronized to Doris [...] +| --table-prefix | Doris table prefix name, such as --table-prefix ods_. [...] +| --table-suffix | Same as above, the suffix name of the Doris table. [...] +| --including-tables | For MySQL tables that need to be synchronized, you can use "|" to separate multiple tables and support regular expressions. For example --including-tables table1 [...] +| --excluding-tables | For tables that do not need to be synchronized, the usage is the same as above. [...] | --mysql-conf | MySQL CDCSource configuration, for example --mysql-conf hostname=127.0.0.1, you can find it [here](https://nightlies.apache.org/flink/flink-cdc-docs-release-3.0/docs/connectors/legacy-flink-cdc-sources/mysql-cdc/) View all configurations MySQL-CDC, where hostname/username/password/database-name is required. When the synchronized library table contains a non-primary key table, `scan.incremental.snapshot.chunk.key-column` must be set, and only one field of non- [...] -| --oracle-conf | Oracle CDCSource configuration, for example --oracle-conf hostname=127.0.0.1, you can find [here](https://nightlies.apache.org/flink/flink-cdc-docs-release-3.0/docs/connectors/legacy-flink-cdc-sources/oracle-cdc/) View all configurations Oracle-CDC, where hostname/username/password/database-name/schema-name is required. | -| --postgres-conf | Postgres CDCSource configuration, e.g. --postgres-conf hostname=127.0.0.1, you can find [here](https://nightlies.apache.org/flink/flink-cdc-docs-release-3.0/docs/connectors/legacy-flink-cdc-sources/postgres-cdc/) View all configurations Postgres-CDC where hostname/username/password/database-name/schema-name/slot.name is required. | -| --sqlserver-conf | SQLServer CDCSource configuration, for example --sqlserver-conf hostname=127.0.0.1, you can find it [here](https://nightlies.apache.org/flink/flink-cdc-docs-release-3.0/docs/connectors/legacy-flink-cdc-sources/sqlserver-cdc/) View all configurations SQLServer-CDC, where hostname/username/password/database-name/schema-name is required. | -| --sink-conf | All configurations of Doris Sink can be found [here](https://doris.apache.org/zh-CN/docs/dev/ecosystem/flink-doris-connector/#%E9%80%9A%E7%94%A8%E9%85%8D%E7%BD%AE%E9%A1%B9) View the complete configuration items. | -| --table-conf | The configuration items of the Doris table(The exception is table-buckets, non-properties attributes), that is, the content contained in properties. For example `--table-conf replication_num=1`, and the `--table-conf table-buckets="tbl1:10,tbl2:20,a.*:30,b.*:40,.*:50"` option specifies the number of buckets for different tables based on the order of regular expressions. If there is no match, the table is created with the default setting of BUCKETS AUTO. | -| --ignore-default-value | Turn off the default value of synchronizing mysql table structure. It is suitable for synchronizing mysql data to doris when the field has a default value but the actual inserted data is null. Reference [here](https://github.com/apache/doris-flink-connector/pull/152) | -| --use-new-schema-change | Whether to use the new schema change to support synchronization of MySQL multi-column changes and default values. since version 1.6.0, the default value has been set to true. Reference [here](https://github.com/apache/doris-flink-connector/pull/167) | -| --single-sink | Whether to use a single Sink to synchronize all tables. When turned on, newly created tables in the upstream can also be automatically recognized and tables automatically created. | -| --multi-to-one-origin | When writing multiple upstream tables into the same table, the configuration of the source table, for example: --multi-to-one-origin="a\_.\*|b_.\*", Reference [here](https://github.com/apache/doris-flink-connector/pull/208) | -| --multi-to-one-target | Used with multi-to-one-origin, the configuration of the target table, such as: --multi-to-one-target="a\|b" | -| --create-table-only | Whether only the table schema should be synchronized | +| --oracle-conf | Oracle CDCSource configuration, for example --oracle-conf hostname=127.0.0.1, you can find [here](https://nightlies.apache.org/flink/flink-cdc-docs-release-3.0/docs/connectors/legacy-flink-cdc-sources/oracle-cdc/) View all configurations Oracle-CDC, where hostname/username/password/database-name/schema-name is required. [...] +| --postgres-conf | Postgres CDCSource configuration, e.g. --postgres-conf hostname=127.0.0.1, you can find [here](https://nightlies.apache.org/flink/flink-cdc-docs-release-3.0/docs/connectors/legacy-flink-cdc-sources/postgres-cdc/) View all configurations Postgres-CDC where hostname/username/password/database-name/schema-name/slot.name is required. [...] +| --sqlserver-conf | SQLServer CDCSource configuration, for example --sqlserver-conf hostname=127.0.0.1, you can find it [here](https://nightlies.apache.org/flink/flink-cdc-docs-release-3.0/docs/connectors/legacy-flink-cdc-sources/sqlserver-cdc/) View all configurations SQLServer-CDC, where hostname/username/password/database-name/schema-name is required. [...] +| --sink-conf | All configurations of Doris Sink can be found [here](https://doris.apache.org/zh-CN/docs/dev/ecosystem/flink-doris-connector/#%E9%80%9A%E7%94%A8%E9%85%8D%E7%BD%AE%E9%A1%B9) View the complete configuration items. [...] +| --table-conf | The configuration items of the Doris table(The exception is table-buckets, non-properties attributes), that is, the content contained in properties. For example `--table-conf replication_num=1`, and the `--table-conf table-buckets="tbl1:10,tbl2:20,a.*:30,b.*:40,.*:50"` option specifies the number of buckets for different tables based on the order of regular expressions. If there is no match, the table is created with the default setting of BUCKETS AUTO. [...] +| --ignore-default-value | Turn off the default value of synchronizing mysql table structure. It is suitable for synchronizing mysql data to doris when the field has a default value but the actual inserted data is null. Reference [here](https://github.com/apache/doris-flink-connector/pull/152) [...] +| --use-new-schema-change | Whether to use the new schema change to support synchronization of MySQL multi-column changes and default values. since version 1.6.0, the default value has been set to true. Reference [here](https://github.com/apache/doris-flink-connector/pull/167) [...] +| --schema-change-mode | The mode for parsing schema change supports two parsing modes: `debezium_structure` and `sql_parser`. The default mode is `debezium_structure`. <br/><br/> `debezium_structure` parses the data structure used when upstream CDC synchronizes data, and determines DDL change operations by parsing this structure. <br/> `sql_parser` determines the DDL change operation by parsing the DDL statement when the upstream CDC synchronizes data, so this parsing mode is more ac [...] +| --single-sink | Whether to use a single Sink to synchronize all tables. When turned on, newly created tables in the upstream can also be automatically recognized and tables automatically created. [...] +| --multi-to-one-origin | When writing multiple upstream tables into the same table, the configuration of the source table, for example: --multi-to-one-origin="a\_.\*|b_.\*", Reference [here](https://github.com/apache/doris-flink-connector/pull/208) [...] +| --multi-to-one-target | Used with multi-to-one-origin, the configuration of the target table, such as: --multi-to-one-target="a\|b" [...] +| --create-table-only | Whether only the table schema should be synchronized [...] >Note: When synchronizing, you need to add the corresponding Flink CDC >dependencies in the $FLINK_HOME/lib directory, such as >flink-sql-connector-mysql-cdc-${version}.jar, >flink-sql-connector-oracle-cdc-${version}.jar diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/ecosystem/flink-doris-connector.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/ecosystem/flink-doris-connector.md index 39f631eb363..cc52a0cb8f6 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/ecosystem/flink-doris-connector.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/ecosystem/flink-doris-connector.md @@ -532,26 +532,27 @@ insert into doris_sink select id,name,bank,age from cdc_mysql_source; -| Key | Comment | -| ----------------------- | ------------------------------------------------------------ | -| --job-name | Flink 任务名称,非必需 | -| --database | 同步到 Doris 的数据库名 | -| --table-prefix | Doris 表前缀名,例如 --table-prefix ods_。 | -| --table-suffix | 同上,Doris 表的后缀名。 | -| --including-tables | 需要同步的 MySQL 表,可以使用"\|" 分隔多个表,并支持正则表达式。比如--including-tables table1 | -| --excluding-tables | 不需要同步的表,用法同上。 | +| Key | Comment | +|-------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| --job-name | Flink 任务名称,非必需 | +| --database | 同步到 Doris 的数据库名 | +| --table-prefix | Doris 表前缀名,例如 --table-prefix ods_。 | +| --table-suffix | 同上,Doris 表的后缀名。 | +| --including-tables | 需要同步的 MySQL 表,可以使用"\|" 分隔多个表,并支持正则表达式。比如--including-tables table1 | +| --excluding-tables | 不需要同步的表,用法同上。 | | --mysql-conf | MySQL CDCSource 配置,例如--mysql-conf hostname=127.0.0.1,您可以在[这里](https://nightlies.apache.org/flink/flink-cdc-docs-release-3.0/docs/connectors/legacy-flink-cdc-sources/mysql-cdc/)查看所有配置 MySQL-CDC,其中 hostname/username/password/database-name 是必需的。同步的库表中含有非主键表时,必须设置 `scan.incremental.snapshot.chunk.key-column`,且只能选择非空类型的一个字段。<br/>例如:`scan.incremental.snapshot.chunk.key-column=database.table:column,database.table1:column...`,不同的库表列之间用`,`隔开。 | -| --oracle-conf | Oracle CDCSource 配置,例如--oracle-conf hostname=127.0.0.1,您可以在[这里](https://nightlies.apache.org/flink/flink-cdc-docs-release-3.0/docs/connectors/legacy-flink-cdc-sources/oracle-cdc/)查看所有配置 Oracle-CDC,其中 hostname/username/password/database-name/schema-name 是必需的。 | -| --postgres-conf | Postgres CDCSource 配置,例如--postgres-conf hostname=127.0.0.1,您可以在[这里](https://nightlies.apache.org/flink/flink-cdc-docs-release-3.0/docs/connectors/legacy-flink-cdc-sources/postgres-cdc/)查看所有配置 Postgres-CDC,其中 hostname/username/password/database-name/schema-name/slot.name 是必需的。 | -| --sqlserver-conf | SQLServer CDCSource 配置,例如--sqlserver-conf hostname=127.0.0.1,您可以在[这里](https://nightlies.apache.org/flink/flink-cdc-docs-release-3.0/docs/connectors/legacy-flink-cdc-sources/sqlserver-cdc/)查看所有配置 SQLServer-CDC,其中 hostname/username/password/database-name/schema-name 是必需的。 | -| --sink-conf | Doris Sink 的所有配置,可以在[这里](https://doris.apache.org/zh-CN/docs/dev/ecosystem/flink-doris-connector/#%E9%80%9A%E7%94%A8%E9%85%8D%E7%BD%AE%E9%A1%B9)查看完整的配置项。 | -| --table-conf | Doris 表的配置项,即 properties 中包含的内容(其中 table-buckets 例外,非 properties 属性)。例如 `--table-conf replication_num=1`,而 `--table-conf table-buckets="tbl1:10,tbl2:20,a.*:30,b.*:40,.*:50"`表示按照正则表达式顺序指定不同表的 buckets 数量,如果没有匹配到则采用 BUCKETS AUTO 建表。 | -| --ignore-default-value | 关闭同步 mysql 表结构的默认值。适用于同步 mysql 数据到 doris 时,字段有默认值,但实际插入数据为 null 情况。参考[#152](https://github.com/apache/doris-flink-connector/pull/152) | -| --use-new-schema-change | 是否使用新的 schema change,支持同步 mysql 多列变更、默认值,1.6.0 开始该参数默认为true。参考[#167](https://github.com/apache/doris-flink-connector/pull/167) | -| --single-sink | 是否使用单个 Sink 同步所有表,开启后也可自动识别上游新创建的表,自动创建表。 | -| --multi-to-one-origin | 将上游多张表写入同一张表时,源表的配置,比如:--multi-to-one-origin="a\_.\*\|b_.\*",具体参考[#208](https://github.com/apache/doris-flink-connector/pull/208) | -| --multi-to-one-target | 与 multi-to-one-origin 搭配使用,目标表的配置,比如:--multi-to-one-target="a\|b" | -| --create-table-only | 是否只仅仅同步表结构 +| --oracle-conf | Oracle CDCSource 配置,例如--oracle-conf hostname=127.0.0.1,您可以在[这里](https://nightlies.apache.org/flink/flink-cdc-docs-release-3.0/docs/connectors/legacy-flink-cdc-sources/oracle-cdc/)查看所有配置 Oracle-CDC,其中 hostname/username/password/database-name/schema-name 是必需的。 | +| --postgres-conf | Postgres CDCSource 配置,例如--postgres-conf hostname=127.0.0.1,您可以在[这里](https://nightlies.apache.org/flink/flink-cdc-docs-release-3.0/docs/connectors/legacy-flink-cdc-sources/postgres-cdc/)查看所有配置 Postgres-CDC,其中 hostname/username/password/database-name/schema-name/slot.name 是必需的。 | +| --sqlserver-conf | SQLServer CDCSource 配置,例如--sqlserver-conf hostname=127.0.0.1,您可以在[这里](https://nightlies.apache.org/flink/flink-cdc-docs-release-3.0/docs/connectors/legacy-flink-cdc-sources/sqlserver-cdc/)查看所有配置 SQLServer-CDC,其中 hostname/username/password/database-name/schema-name 是必需的。 | +| --sink-conf | Doris Sink 的所有配置,可以在[这里](https://doris.apache.org/zh-CN/docs/dev/ecosystem/flink-doris-connector/#%E9%80%9A%E7%94%A8%E9%85%8D%E7%BD%AE%E9%A1%B9)查看完整的配置项。 | +| --table-conf | Doris 表的配置项,即 properties 中包含的内容(其中 table-buckets 例外,非 properties 属性)。例如 `--table-conf replication_num=1`,而 `--table-conf table-buckets="tbl1:10,tbl2:20,a.*:30,b.*:40,.*:50"`表示按照正则表达式顺序指定不同表的 buckets 数量,如果没有匹配到则采用 BUCKETS AUTO 建表。 | +| --ignore-default-value | 关闭同步 mysql 表结构的默认值。适用于同步 mysql 数据到 doris 时,字段有默认值,但实际插入数据为 null 情况。参考[#152](https://github.com/apache/doris-flink-connector/pull/152) | +| --use-new-schema-change | 是否使用新的 schema change,支持同步 mysql 多列变更、默认值,1.6.0 开始该参数默认为true。参考[#167](https://github.com/apache/doris-flink-connector/pull/167) | +| --schema-change-mode | 解析 schema change 的模式,支持 `debezium_structure`、`sql_parser` 两种解析模式,默认采用 `debezium_structure` 模式。<br/><br/> `debezium_structure` 解析上游 CDC 同步数据时所使用的数据结构,通过解析该结构判断 DDL 变更操作。 <br/> `sql_parser` 通过解析上游 CDC 同步数据时的 DDL 语句,从而判断 DDL 变更操作,因此该解析模式更加准确。<br/> 使用例子:`--schema-change-mode debezium_structure`<br/> 本功能将在 1.6.2.1 后的版本中提供 | +| --single-sink | 是否使用单个 Sink 同步所有表,开启后也可自动识别上游新创建的表,自动创建表。 | +| --multi-to-one-origin | 将上游多张表写入同一张表时,源表的配置,比如:--multi-to-one-origin="a\_.\*\|b_.\*",具体参考[#208](https://github.com/apache/doris-flink-connector/pull/208) | +| --multi-to-one-target | 与 multi-to-one-origin 搭配使用,目标表的配置,比如:--multi-to-one-target="a\|b" | +| --create-table-only | 是否只仅仅同步表的结构 | >注:同步时需要在$FLINK_HOME/lib 目录下添加对应的 Flink CDC 依赖,比如 >flink-sql-connector-mysql-cdc-${version}.jar,flink-sql-connector-oracle-cdc-${version}.jar diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/ecosystem/flink-doris-connector.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/ecosystem/flink-doris-connector.md index 2f782902400..f9b5498d886 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/ecosystem/flink-doris-connector.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/ecosystem/flink-doris-connector.md @@ -531,27 +531,27 @@ insert into doris_sink select id,name,bank,age from cdc_mysql_source; ``` - -| Key | Comment | -| ----------------------- | ------------------------------------------------------------ | -| --job-name | Flink 任务名称,非必需 | -| --database | 同步到 Doris 的数据库名 | -| --table-prefix | Doris 表前缀名,例如 --table-prefix ods_。 | -| --table-suffix | 同上,Doris 表的后缀名。 | -| --including-tables | 需要同步的 MySQL 表,可以使用"\|" 分隔多个表,并支持正则表达式。比如--including-tables table1 | -| --excluding-tables | 不需要同步的表,用法同上。 | +| Key | Comment | +|-------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| --job-name | Flink 任务名称,非必需 | +| --database | 同步到 Doris 的数据库名 | +| --table-prefix | Doris 表前缀名,例如 --table-prefix ods_。 | +| --table-suffix | 同上,Doris 表的后缀名。 | +| --including-tables | 需要同步的 MySQL 表,可以使用"\|" 分隔多个表,并支持正则表达式。比如--including-tables table1 | +| --excluding-tables | 不需要同步的表,用法同上。 | | --mysql-conf | MySQL CDCSource 配置,例如--mysql-conf hostname=127.0.0.1,您可以在[这里](https://nightlies.apache.org/flink/flink-cdc-docs-release-3.0/docs/connectors/legacy-flink-cdc-sources/mysql-cdc/)查看所有配置 MySQL-CDC,其中 hostname/username/password/database-name 是必需的。同步的库表中含有非主键表时,必须设置 `scan.incremental.snapshot.chunk.key-column`,且只能选择非空类型的一个字段。<br/>例如:`scan.incremental.snapshot.chunk.key-column=database.table:column,database.table1:column...`,不同的库表列之间用`,`隔开。 | -| --oracle-conf | Oracle CDCSource 配置,例如--oracle-conf hostname=127.0.0.1,您可以在[这里](https://nightlies.apache.org/flink/flink-cdc-docs-release-3.0/docs/connectors/legacy-flink-cdc-sources/oracle-cdc/)查看所有配置 Oracle-CDC,其中 hostname/username/password/database-name/schema-name 是必需的。 | -| --postgres-conf | Postgres CDCSource 配置,例如--postgres-conf hostname=127.0.0.1,您可以在[这里](https://nightlies.apache.org/flink/flink-cdc-docs-release-3.0/docs/connectors/legacy-flink-cdc-sources/postgres-cdc/)查看所有配置 Postgres-CDC,其中 hostname/username/password/database-name/schema-name/slot.name 是必需的。 | -| --sqlserver-conf | SQLServer CDCSource 配置,例如--sqlserver-conf hostname=127.0.0.1,您可以在[这里](https://nightlies.apache.org/flink/flink-cdc-docs-release-3.0/docs/connectors/legacy-flink-cdc-sources/sqlserver-cdc/)查看所有配置 SQLServer-CDC,其中 hostname/username/password/database-name/schema-name 是必需的。 | -| --sink-conf | Doris Sink 的所有配置,可以在[这里](https://doris.apache.org/zh-CN/docs/dev/ecosystem/flink-doris-connector/#%E9%80%9A%E7%94%A8%E9%85%8D%E7%BD%AE%E9%A1%B9)查看完整的配置项。 | -| --table-conf | Doris 表的配置项,即 properties 中包含的内容(其中 table-buckets 例外,非 properties 属性)。例如 `--table-conf replication_num=1`,而 `--table-conf table-buckets="tbl1:10,tbl2:20,a.*:30,b.*:40,.*:50"`表示按照正则表达式顺序指定不同表的 buckets 数量,如果没有匹配到则采用 BUCKETS AUTO 建表。 | -| --ignore-default-value | 关闭同步 mysql 表结构的默认值。适用于同步 mysql 数据到 doris 时,字段有默认值,但实际插入数据为 null 情况。参考[#152](https://github.com/apache/doris-flink-connector/pull/152) | -| --use-new-schema-change | 是否使用新的 schema change,支持同步 mysql 多列变更、默认值。1.6.0 默认为true。参考[#167](https://github.com/apache/doris-flink-connector/pull/167) | -| --single-sink | 是否使用单个 Sink 同步所有表,开启后也可自动识别上游新创建的表,自动创建表。 | -| --multi-to-one-origin | 将上游多张表写入同一张表时,源表的配置,比如:--multi-to-one-origin="a\_.\*\|b_.\*",具体参考[#208](https://github.com/apache/doris-flink-connector/pull/208) | -| --multi-to-one-target | 与 multi-to-one-origin 搭配使用,目标表的配置,比如:--multi-to-one-target="a\|b" | -| --create-table-only | 是否只仅仅创建表的结构 +| --oracle-conf | Oracle CDCSource 配置,例如--oracle-conf hostname=127.0.0.1,您可以在[这里](https://nightlies.apache.org/flink/flink-cdc-docs-release-3.0/docs/connectors/legacy-flink-cdc-sources/oracle-cdc/)查看所有配置 Oracle-CDC,其中 hostname/username/password/database-name/schema-name 是必需的。 | +| --postgres-conf | Postgres CDCSource 配置,例如--postgres-conf hostname=127.0.0.1,您可以在[这里](https://nightlies.apache.org/flink/flink-cdc-docs-release-3.0/docs/connectors/legacy-flink-cdc-sources/postgres-cdc/)查看所有配置 Postgres-CDC,其中 hostname/username/password/database-name/schema-name/slot.name 是必需的。 | +| --sqlserver-conf | SQLServer CDCSource 配置,例如--sqlserver-conf hostname=127.0.0.1,您可以在[这里](https://nightlies.apache.org/flink/flink-cdc-docs-release-3.0/docs/connectors/legacy-flink-cdc-sources/sqlserver-cdc/)查看所有配置 SQLServer-CDC,其中 hostname/username/password/database-name/schema-name 是必需的。 | +| --sink-conf | Doris Sink 的所有配置,可以在[这里](https://doris.apache.org/zh-CN/docs/dev/ecosystem/flink-doris-connector/#%E9%80%9A%E7%94%A8%E9%85%8D%E7%BD%AE%E9%A1%B9)查看完整的配置项。 | +| --table-conf | Doris 表的配置项,即 properties 中包含的内容(其中 table-buckets 例外,非 properties 属性)。例如 `--table-conf replication_num=1`,而 `--table-conf table-buckets="tbl1:10,tbl2:20,a.*:30,b.*:40,.*:50"`表示按照正则表达式顺序指定不同表的 buckets 数量,如果没有匹配到则采用 BUCKETS AUTO 建表。 | +| --ignore-default-value | 关闭同步 mysql 表结构的默认值。适用于同步 mysql 数据到 doris 时,字段有默认值,但实际插入数据为 null 情况。参考[#152](https://github.com/apache/doris-flink-connector/pull/152) | +| --use-new-schema-change | 是否使用新的 schema change,支持同步 mysql 多列变更、默认值,1.6.0 开始该参数默认为true。参考[#167](https://github.com/apache/doris-flink-connector/pull/167) | +| --schema-change-mode | 解析 schema change 的模式,支持 `debezium_structure`、`sql_parser` 两种解析模式,默认采用 `debezium_structure` 模式。<br/><br/> `debezium_structure` 解析上游 CDC 同步数据时所使用的数据结构,通过解析该结构判断 DDL 变更操作。 <br/> `sql_parser` 通过解析上游 CDC 同步数据时的 DDL 语句,从而判断 DDL 变更操作,因此该解析模式更加准确。<br/> 使用例子:`--schema-change-mode debezium_structure`<br/> 本功能将在 1.6.2.1 后的版本中提供 | +| --single-sink | 是否使用单个 Sink 同步所有表,开启后也可自动识别上游新创建的表,自动创建表。 | +| --multi-to-one-origin | 将上游多张表写入同一张表时,源表的配置,比如:--multi-to-one-origin="a\_.\*\|b_.\*",具体参考[#208](https://github.com/apache/doris-flink-connector/pull/208) | +| --multi-to-one-target | 与 multi-to-one-origin 搭配使用,目标表的配置,比如:--multi-to-one-target="a\|b" | +| --create-table-only | 是否只仅仅创建表的结构 | >注:同步时需要在$FLINK_HOME/lib 目录下添加对应的 Flink CDC 依赖,比如 >flink-sql-connector-mysql-cdc-${version}.jar,flink-sql-connector-oracle-cdc-${version}.jar diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/ecosystem/flink-doris-connector.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/ecosystem/flink-doris-connector.md index 39f631eb363..c229009ebc7 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/ecosystem/flink-doris-connector.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/ecosystem/flink-doris-connector.md @@ -548,10 +548,11 @@ insert into doris_sink select id,name,bank,age from cdc_mysql_source; | --table-conf | Doris 表的配置项,即 properties 中包含的内容(其中 table-buckets 例外,非 properties 属性)。例如 `--table-conf replication_num=1`,而 `--table-conf table-buckets="tbl1:10,tbl2:20,a.*:30,b.*:40,.*:50"`表示按照正则表达式顺序指定不同表的 buckets 数量,如果没有匹配到则采用 BUCKETS AUTO 建表。 | | --ignore-default-value | 关闭同步 mysql 表结构的默认值。适用于同步 mysql 数据到 doris 时,字段有默认值,但实际插入数据为 null 情况。参考[#152](https://github.com/apache/doris-flink-connector/pull/152) | | --use-new-schema-change | 是否使用新的 schema change,支持同步 mysql 多列变更、默认值,1.6.0 开始该参数默认为true。参考[#167](https://github.com/apache/doris-flink-connector/pull/167) | +| --schema-change-mode | 解析 schema change 的模式,支持 `debezium_structure`、`sql_parser` 两种解析模式,默认采用 `debezium_structure` 模式。<br/><br/> `debezium_structure` 解析上游 CDC 同步数据时所使用的数据结构,通过解析该结构判断 DDL 变更操作。 <br/> `sql_parser` 通过解析上游 CDC 同步数据时的 DDL 语句,从而判断 DDL 变更操作,因此该解析模式更加准确。<br/> 使用例子:`--schema-change-mode debezium_structure`<br/> 本功能将在 1.6.2.1 后的版本中提供| | --single-sink | 是否使用单个 Sink 同步所有表,开启后也可自动识别上游新创建的表,自动创建表。 | | --multi-to-one-origin | 将上游多张表写入同一张表时,源表的配置,比如:--multi-to-one-origin="a\_.\*\|b_.\*",具体参考[#208](https://github.com/apache/doris-flink-connector/pull/208) | | --multi-to-one-target | 与 multi-to-one-origin 搭配使用,目标表的配置,比如:--multi-to-one-target="a\|b" | -| --create-table-only | 是否只仅仅同步表结构 +| --create-table-only | 是否只仅仅同步表结构| >注:同步时需要在$FLINK_HOME/lib 目录下添加对应的 Flink CDC 依赖,比如 >flink-sql-connector-mysql-cdc-${version}.jar,flink-sql-connector-oracle-cdc-${version}.jar diff --git a/versioned_docs/version-2.1/ecosystem/flink-doris-connector.md b/versioned_docs/version-2.1/ecosystem/flink-doris-connector.md index ad705a1ba7f..91863b6ae78 100644 --- a/versioned_docs/version-2.1/ecosystem/flink-doris-connector.md +++ b/versioned_docs/version-2.1/ecosystem/flink-doris-connector.md @@ -522,26 +522,27 @@ insert into doris_sink select id,name,bank,age from cdc_mysql_source; [--table-conf <doris-table-conf> [--table-conf <doris-table-conf> ...]] ``` -| Key | Comment | -| ----------------------- | ------------------------------------------------------------ | -| --job-name | Flink task name, optional | -| --database | Database name synchronized to Doris | -| --table-prefix | Doris table prefix name, such as --table-prefix ods_. | -| --table-suffix | Same as above, the suffix name of the Doris table. | -| --including-tables | For MySQL tables that need to be synchronized, you can use "|" to separate multiple tables and support regular expressions. For example --including-tables table1 | -| --excluding-tables | For tables that do not need to be synchronized, the usage is the same as above. | +| Key | Comment [...] +| ----------------------- |--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- [...] +| --job-name | Flink task name, optional [...] +| --database | Database name synchronized to Doris [...] +| --table-prefix | Doris table prefix name, such as --table-prefix ods_. [...] +| --table-suffix | Same as above, the suffix name of the Doris table. [...] +| --including-tables | For MySQL tables that need to be synchronized, you can use "|" to separate multiple tables and support regular expressions. For example --including-tables table1 [...] +| --excluding-tables | For tables that do not need to be synchronized, the usage is the same as above. [...] | --mysql-conf | MySQL CDCSource configuration, for example --mysql-conf hostname=127.0.0.1, you can find it [here](https://nightlies.apache.org/flink/flink-cdc-docs-release-3.0/docs/connectors/legacy-flink-cdc-sources/mysql-cdc/) View all configurations MySQL-CDC, where hostname/username/password/database-name is required. When the synchronized library table contains a non-primary key table, `scan.incremental.snapshot.chunk.key-column` must be set, and only one field of non- [...] -| --oracle-conf | Oracle CDCSource configuration, for example --oracle-conf hostname=127.0.0.1, you can find [here](https://nightlies.apache.org/flink/flink-cdc-docs-release-3.0/docs/connectors/legacy-flink-cdc-sources/oracle-cdc/) View all configurations Oracle-CDC, where hostname/username/password/database-name/schema-name is required. | -| --postgres-conf | Postgres CDCSource configuration, e.g. --postgres-conf hostname=127.0.0.1, you can find [here](https://nightlies.apache.org/flink/flink-cdc-docs-release-3.0/docs/connectors/legacy-flink-cdc-sources/postgres-cdc/) View all configurations Postgres-CDC where hostname/username/password/database-name/schema-name/slot.name is required. | -| --sqlserver-conf | SQLServer CDCSource configuration, for example --sqlserver-conf hostname=127.0.0.1, you can find it [here](https://nightlies.apache.org/flink/flink-cdc-docs-release-3.0/docs/connectors/legacy-flink-cdc-sources/sqlserver-cdc/) View all configurations SQLServer-CDC, where hostname/username/password/database-name/schema-name is required. | -| --sink-conf | All configurations of Doris Sink can be found [here](https://doris.apache.org/zh-CN/docs/dev/ecosystem/flink-doris-connector/#%E9%80%9A%E7%94%A8%E9%85%8D%E7%BD%AE%E9%A1%B9) View the complete configuration items. | -| --table-conf | The configuration items of the Doris table(The exception is table-buckets, non-properties attributes), that is, the content contained in properties. For example `--table-conf replication_num=1`, and the `--table-conf table-buckets="tbl1:10,tbl2:20,a.*:30,b.*:40,.*:50"` option specifies the number of buckets for different tables based on the order of regular expressions. If there is no match, the table is created with the default setting of BUCKETS AUTO. | -| --ignore-default-value | Turn off the default value of synchronizing mysql table structure. It is suitable for synchronizing mysql data to doris when the field has a default value but the actual inserted data is null. Reference [here](https://github.com/apache/doris-flink-connector/pull/152) | -| --use-new-schema-change | Whether to use the new schema change to support synchronization of MySQL multi-column changes and default values. since version 1.6.0, the default value has been set to true. Reference [here](https://github.com/apache/doris-flink-connector/pull/167) | -| --single-sink | Whether to use a single Sink to synchronize all tables. When turned on, newly created tables in the upstream can also be automatically recognized and tables automatically created. | -| --multi-to-one-origin | When writing multiple upstream tables into the same table, the configuration of the source table, for example: --multi-to-one-origin="a\_.\*|b_.\*", Reference [here](https://github.com/apache/doris-flink-connector/pull/208) | -| --multi-to-one-target | Used with multi-to-one-origin, the configuration of the target table, such as: --multi-to-one-target="a\|b" | -| --create-table-only | Whether only the table schema should be synchronized | +| --oracle-conf | Oracle CDCSource configuration, for example --oracle-conf hostname=127.0.0.1, you can find [here](https://nightlies.apache.org/flink/flink-cdc-docs-release-3.0/docs/connectors/legacy-flink-cdc-sources/oracle-cdc/) View all configurations Oracle-CDC, where hostname/username/password/database-name/schema-name is required. [...] +| --postgres-conf | Postgres CDCSource configuration, e.g. --postgres-conf hostname=127.0.0.1, you can find [here](https://nightlies.apache.org/flink/flink-cdc-docs-release-3.0/docs/connectors/legacy-flink-cdc-sources/postgres-cdc/) View all configurations Postgres-CDC where hostname/username/password/database-name/schema-name/slot.name is required. [...] +| --sqlserver-conf | SQLServer CDCSource configuration, for example --sqlserver-conf hostname=127.0.0.1, you can find it [here](https://nightlies.apache.org/flink/flink-cdc-docs-release-3.0/docs/connectors/legacy-flink-cdc-sources/sqlserver-cdc/) View all configurations SQLServer-CDC, where hostname/username/password/database-name/schema-name is required. [...] +| --sink-conf | All configurations of Doris Sink can be found [here](https://doris.apache.org/zh-CN/docs/dev/ecosystem/flink-doris-connector/#%E9%80%9A%E7%94%A8%E9%85%8D%E7%BD%AE%E9%A1%B9) View the complete configuration items. [...] +| --table-conf | The configuration items of the Doris table(The exception is table-buckets, non-properties attributes), that is, the content contained in properties. For example `--table-conf replication_num=1`, and the `--table-conf table-buckets="tbl1:10,tbl2:20,a.*:30,b.*:40,.*:50"` option specifies the number of buckets for different tables based on the order of regular expressions. If there is no match, the table is created with the default setting of BUCKETS AUTO. [...] +| --ignore-default-value | Turn off the default value of synchronizing mysql table structure. It is suitable for synchronizing mysql data to doris when the field has a default value but the actual inserted data is null. Reference [here](https://github.com/apache/doris-flink-connector/pull/152) [...] +| --use-new-schema-change | Whether to use the new schema change to support synchronization of MySQL multi-column changes and default values. since version 1.6.0, the default value has been set to true. Reference [here](https://github.com/apache/doris-flink-connector/pull/167) [...] +| --schema-change-mode | The mode for parsing schema change supports two parsing modes: `debezium_structure` and `sql_parser`. The default mode is `debezium_structure`. <br/><br/> `debezium_structure` parses the data structure used when upstream CDC synchronizes data, and determines DDL change operations by parsing this structure. <br/> `sql_parser` determines the DDL change operation by parsing the DDL statement when the upstream CDC synchronizes data, so this parsing mode is more ac [...] +| --single-sink | Whether to use a single Sink to synchronize all tables. When turned on, newly created tables in the upstream can also be automatically recognized and tables automatically created. [...] +| --multi-to-one-origin | When writing multiple upstream tables into the same table, the configuration of the source table, for example: --multi-to-one-origin="a\_.\*|b_.\*", Reference [here](https://github.com/apache/doris-flink-connector/pull/208) [...] +| --multi-to-one-target | Used with multi-to-one-origin, the configuration of the target table, such as: --multi-to-one-target="a\|b" [...] +| --create-table-only | Whether only the table schema should be synchronized [...] >Note: When synchronizing, you need to add the corresponding Flink CDC >dependencies in the $FLINK_HOME/lib directory, such as >flink-sql-connector-mysql-cdc-${version}.jar, >flink-sql-connector-oracle-cdc-${version}.jar diff --git a/versioned_docs/version-3.0/ecosystem/flink-doris-connector.md b/versioned_docs/version-3.0/ecosystem/flink-doris-connector.md index 264bc8b1a4e..c3f834f6d13 100644 --- a/versioned_docs/version-3.0/ecosystem/flink-doris-connector.md +++ b/versioned_docs/version-3.0/ecosystem/flink-doris-connector.md @@ -542,6 +542,7 @@ insert into doris_sink select id,name,bank,age from cdc_mysql_source; | --table-conf | The configuration items of the Doris table(The exception is table-buckets, non-properties attributes), that is, the content contained in properties. For example `--table-conf replication_num=1`, and the `--table-conf table-buckets="tbl1:10,tbl2:20,a.*:30,b.*:40,.*:50"` option specifies the number of buckets for different tables based on the order of regular expressions. If there is no match, the table is created with the default setting of BUCKETS AUTO. | | --ignore-default-value | Turn off the default value of synchronizing mysql table structure. It is suitable for synchronizing mysql data to doris when the field has a default value but the actual inserted data is null. Reference [here](https://github.com/apache/doris-flink-connector/pull/152) | | --use-new-schema-change | Whether to use the new schema change to support synchronization of MySQL multi-column changes and default values. since version 1.6.0, the default value has been set to true. Reference [here](https://github.com/apache/doris-flink-connector/pull/167) | +| --schema-change-mode | The mode for parsing schema change supports two parsing modes: `debezium_structure` and `sql_parser`. The default mode is `debezium_structure`. <br/><br/> `debezium_structure` parses the data structure used when upstream CDC synchronizes data, and determines DDL change operations by parsing this structure. <br/> `sql_parser` determines the DDL change operation by parsing the DDL statement when the upstream CDC synchronizes data, so this parsing mode is more ac [...] | --single-sink | Whether to use a single Sink to synchronize all tables. When turned on, newly created tables in the upstream can also be automatically recognized and tables automatically created. | | --multi-to-one-origin | When writing multiple upstream tables into the same table, the configuration of the source table, for example: --multi-to-one-origin="a\_.\*|b_.\*", Reference [here](https://github.com/apache/doris-flink-connector/pull/208) | | --multi-to-one-target | Used with multi-to-one-origin, the configuration of the target table, such as: --multi-to-one-target="a\|b" | --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org