This is an automated email from the ASF dual-hosted git repository. yunqing pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/inlong-website.git
The following commit(s) were added to refs/heads/master by this push: new f59f2a7186 [INLONG-579][Doc] Add doc for oracle connector for all migrate (#584) f59f2a7186 is described below commit f59f2a71867a85f6c447b7ea37fea67f12a64e06 Author: emhui <111486498+e-m...@users.noreply.github.com> AuthorDate: Tue Nov 8 14:29:35 2022 +0800 [INLONG-579][Doc] Add doc for oracle connector for all migrate (#584) * [INLONG-579][Doc] Add doc for oracle connector for all migrate --- docs/data_node/extract_node/oracle-cdc.md | 109 ++++++++++++++++++- .../current/data_node/extract_node/oracle-cdc.md | 117 +++++++++++++++++++-- 2 files changed, 216 insertions(+), 10 deletions(-) diff --git a/docs/data_node/extract_node/oracle-cdc.md b/docs/data_node/extract_node/oracle-cdc.md index f861091bcf..e330965912 100644 --- a/docs/data_node/extract_node/oracle-cdc.md +++ b/docs/data_node/extract_node/oracle-cdc.md @@ -294,7 +294,7 @@ TODO: It will be supported in the future. <td>required</td> <td style={{wordWrap: 'break-word'}}>(none)</td> <td>String</td> - <td>Table name of the Oracle database to monitor.</td> + <td>Table name of the Oracle database to monitor. The value is of the form <i><schema_name>.<table_name></i></td> </tr> <tr> <td>port</td> @@ -328,6 +328,13 @@ TODO: It will be supported in the future. <td>String</td> <td>Inlong metric label, format of value is groupId=xxgroup&streamId=xxstream&nodeId=xxnode.</td> </tr> + <tr> + <td>source.multiple.enable</td> + <td>optional</td> + <td style={{wordWrap: 'break-word'}}>false</td> + <td>Boolean</td> + <td>Whether to enable multiple schema and table migration. If it is' true ', Oracle Extract Node will compress the physical field of the table into a special meta field 'data_canal' in the format of 'canal json'.</td> + </tr> </tbody> </table> </div> @@ -378,6 +385,61 @@ The following format metadata can be exposed as read-only (VIRTUAL) columns in a <td>TIMESTAMP_LTZ(3) NOT NULL</td> <td>It indicates the time that the change was made in the database. <br/>If the record is read from snapshot of the table instead of the change stream, the value is always 0.</td> </tr> + <tr> + <td>meta.table_name</td> + <td>STRING NOT NULL</td> + <td>Name of the table that contain the row.</td> + </tr> + <tr> + <td>meta.schema_name</td> + <td>STRING NOT NULL</td> + <td>Name of the schema that contain the row.</td> + </tr> + <tr> + <td>meta.database_name</td> + <td>STRING NOT NULL</td> + <td>Name of the database that contain the row.</td> + </tr> + <tr> + <td>meta.op_ts</td> + <td>TIMESTAMP_LTZ(3) NOT NULL</td> + <td>It indicates the time that the change was made in the database. <br/>If the record is read from snapshot of the table instead of the change stream, the value is always 0.</td> + </tr> + <tr> + <td>meta.op_type</td> + <td>STRING</td> + <td>Type of database operation, such as INSERT/DELETE, etc.</td> + </tr> + <tr> + <td>meta.data_canal</td> + <td>STRING/BYTES</td> + <td>Data for rows in `canal-json` format only exists when the `source.multiple.enable` option is 'true'.</td> + </tr> + <tr> + <td>meta.is_ddl</td> + <td>BOOLEAN</td> + <td>Whether the DDL statement.</td> + </tr> + <tr> + <td>meta.ts</td> + <td>TIMESTAMP_LTZ(3) NOT NULL</td> + <td>The current time when the row was received and processed.</td> + </tr> + <tr> + <td>meta.sql_type</td> + <td>MAP</td> + <td>Mapping of sql_type table fields to java data type IDs.</td> + </tr> + <tr> + <td>meta.oracle_type</td> + <td>MAP</td> + <td>Structure of the table.</td> + </tr> + <tr> + <td>meta.pk_names</td> + <td>ARRAY</td> + <td>Primay key name of the table.</td> + </tr> </tbody> </table> @@ -387,7 +449,18 @@ CREATE TABLE products ( db_name STRING METADATA FROM 'database_name' VIRTUAL, schema_name STRING METADATA FROM 'schema_name' VIRTUAL, table_name STRING METADATA FROM 'table_name' VIRTUAL, - operation_ts TIMESTAMP_LTZ(3) METADATA FROM 'op_ts' VIRTUAL, + op_ts TIMESTAMP_LTZ(3) METADATA FROM 'op_ts' VIRTUAL, + meta_db_name STRING METADATA FROM 'meta.database_name' VIRTUAL, + meta_schema_name STRING METADATA FROM 'meta.schema_name' VIRTUAL, + meta_table_name STRING METADATA FROM 'meta.table_name' VIRTUAL, + meat_op_ts TIMESTAMP_LTZ(3) METADATA FROM 'meta.op_ts' VIRTUAL, + meta_op_type STRING METADATA FROM 'meta.op_type' VIRTUAL, + meta_data_canal STRING METADATA FROM 'meta.data_canal' VIRTUAL, + meta_is_ddl BOOLEAN METADATA FROM 'meta.is_ddl' VIRTUAL, + meta_ts TIMESTAMP_LTZ(3) METADATA FROM 'meta.ts' VIRTUAL, + meta_sql_type MAP<STRING, INT> METADATA FROM 'meta.sql_type' VIRTUAL, + meat_oracle_type MAP<STRING, STRING> METADATA FROM 'meta.oracle_type' VIRTUAL, + meta_pk_names ARRAY<STRING> METADATA FROM 'meta.pk_names' VIRTUAL ID INT NOT NULL, NAME STRING, DESCRIPTION STRING, @@ -401,7 +474,7 @@ CREATE TABLE products ( 'password' = 'flinkpw', 'database-name' = 'XE', 'schema-name' = 'inventory', - 'table-name' = 'products' + 'table-name' = 'inventory.products' ); ``` @@ -427,6 +500,36 @@ _Note: the mechanism of `scan.startup.mode` option relying on Debezium's `snapsh The Oracle Extract Node can't work in parallel reading, because there is only one task can receive change events. +### Whole Database, Multiple Schemas, Multiple Tables Migration + +Oracle Extract Node supports the whole database, multiple schemas, multiple tables migration function. When you enable this function, Oracle Extract Node will compress the physical field of the table into a special meta field 'data_canal' in the format of 'canal json'. + +config options: + +| Option | Required | Default | Type | Description | +| ---| ---| ---| ---| ---| +|source.multiple.enable|optional| false|String|Specify `'source.multiple.enable' = 'true'` to enable the whole database, multiple schemas, multiple tables migration function | +|schema-name|required|(none)|String| Schema name of the Oracle database to monitor. If you want to capture multiple schemas, you can use commas to separate them. For example: `'schema-name' = 'SCHEMA1,SCHEMA2'` | +|table-name| required | (none) |String| Table name of the Oracle database to monitor. If you want to capture multiple tables, you can use commas to separate them. For example: `'table-name' = 'SCHEMA1.TB.*, SCHEMA2.TB1'`| + +The CREATE TABLE example demonstrates the syntax of this function: + +```sql +CREATE TABLE node( + data STRING METADATA FROM 'meta.data_canal' VIRTUAL) + WITH ( + 'connector' = 'oracle-cdc-inlong', + 'hostname' = 'localhost', + 'port' = '1521', + 'username' = 'flinkuser', + 'password' = 'flinkpw', + 'database-name' = 'XE', + 'schema-name' = 'inventory', + 'table-name' = 'inventory..*', + 'source.multiple.enable' = 'true' +) +``` + ## Data Type Mapping <div class="wy-table-responsive"> diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/extract_node/oracle-cdc.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/extract_node/oracle-cdc.md index 565f3b4d53..fdf261a6f4 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/extract_node/oracle-cdc.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/extract_node/oracle-cdc.md @@ -294,7 +294,7 @@ TODO: 将在未来支持此功能。 <td>required</td> <td style={{wordWrap: 'break-word'}}>(none)</td> <td>String</td> - <td>要监视的 Oracle 数据库的表名。</td> + <td>要监视的 Oracle 数据库的表名。格式为<i><schema_name>.<table_name></i></td> </tr> <tr> <td>port</td> @@ -329,6 +329,13 @@ Oracle CDC 消费者的可选启动模式,有效枚举为"initial" <td>String</td> <td>inlong metric 的标签值,该值的构成为groupId&streamId&nodeId。</td> </tr> + <tr> + <td>source.multiple.enable</td> + <td>可选</td> + <td style={{wordWrap: 'break-word'}}>false</td> + <td>Boolean</td> + <td>是否开启多模式、表同步功能,如果为 'true',Oracle Extract Node 则将表的物理字段压缩成 'canal-json' 格式的特殊元字段 'data_canal'。</td> + </tr> </tbody> </table> </div> @@ -361,22 +368,77 @@ restart-strategy.fixed-delay.attempts: 2147483647 <tr> <td>table_name</td> <td>STRING NOT NULL</td> - <td>Name of the table that contain the row.</td> + <td>该行所属的表名。</td> </tr> <tr> <td>schema_name</td> <td>STRING NOT NULL</td> - <td>Name of the schema that contain the row.</td> + <td>该行所属的模式名称。</td> </tr> <tr> <td>database_name</td> <td>STRING NOT NULL</td> - <td>Name of the database that contain the row.</td> + <td>该行所属的数据库名称。</td> </tr> <tr> <td>op_ts</td> <td>TIMESTAMP_LTZ(3) NOT NULL</td> - <td>It indicates the time that the change was made in the database. <br/>If the record is read from snapshot of the table instead of the change stream, the value is always 0.</td> + <td>它指示在数据库中进行更改的时间。<br/>如果记录从表的快照而不是change流中读取,则该值始终为0。</td> + </tr> + <tr> + <td>meta.table_name</td> + <td>STRING NOT NULL</td> + <td>该行所属的表名。</td> + </tr> + <tr> + <td>meta.schema_name</td> + <td>STRING NOT NULL</td> + <td>该行所属的模式名称。</td> + </tr> + <tr> + <td>meta.database_name</td> + <td>STRING NOT NULL</td> + <td>该行所属的数据库名称。</td> + </tr> + <tr> + <td>meta.op_ts</td> + <td>TIMESTAMP_LTZ(3) NOT NULL</td> + <td>它指示在数据库中进行更改的时间。<br/>如果记录从表的快照而不是change流中读取,则该值始终为0。</td> + </tr> + <tr> + <td>meta.op_type</td> + <td>STRING</td> + <td>数据库操作的类型,如 INSERT/DELETE 等。</td> + </tr> + <tr> + <td>meta.data_canal</td> + <td>STRING/BYTES</td> + <td>`canal-json` 格式化的行的数据只有在 `source.multiple.enable` 选项为 'true' 时才存在。</td> + </tr> + <tr> + <td>meta.is_ddl</td> + <td>BOOLEAN</td> + <td>是否是 DDL 语句。</td> + </tr> + <tr> + <td>meta.ts</td> + <td>TIMESTAMP_LTZ(3) NOT NULL</td> + <td>接收和处理行的当前时间。</td> + </tr> + <tr> + <td>meta.sql_type</td> + <td>MAP</td> + <td>将 Sql_type 表字段映射到 Java 数据类型 Id。</td> + </tr> + <tr> + <td>meta.oracle_type</td> + <td>MAP</td> + <td>表的结构。</td> + </tr> + <tr> + <td>meta.pk_names</td> + <td>ARRAY</td> + <td>表的主键名称。</td> </tr> </tbody> </table> @@ -388,7 +450,18 @@ CREATE TABLE products ( db_name STRING METADATA FROM 'database_name' VIRTUAL, schema_name STRING METADATA FROM 'schema_name' VIRTUAL, table_name STRING METADATA FROM 'table_name' VIRTUAL, - operation_ts TIMESTAMP_LTZ(3) METADATA FROM 'op_ts' VIRTUAL, + op_ts TIMESTAMP_LTZ(3) METADATA FROM 'op_ts' VIRTUAL, + meta_db_name STRING METADATA FROM 'meta.database_name' VIRTUAL, + meta_schema_name STRING METADATA FROM 'meta.schema_name' VIRTUAL, + meta_table_name STRING METADATA FROM 'meta.table_name' VIRTUAL, + meat_op_ts TIMESTAMP_LTZ(3) METADATA FROM 'meta.op_ts' VIRTUAL, + meta_op_type STRING METADATA FROM 'meta.op_type' VIRTUAL, + meta_data_canal STRING METADATA FROM 'meta.data_canal' VIRTUAL, + meta_is_ddl BOOLEAN METADATA FROM 'meta.is_ddl' VIRTUAL, + meta_ts TIMESTAMP_LTZ(3) METADATA FROM 'meta.ts' VIRTUAL, + meta_sql_type MAP<STRING, INT> METADATA FROM 'meta.sql_type' VIRTUAL, + meat_oracle_type MAP<STRING, STRING> METADATA FROM 'meta.oracle_type' VIRTUAL, + meta_pk_names ARRAY<STRING> METADATA FROM 'meta.pk_names' VIRTUAL ID INT NOT NULL, NAME STRING, DESCRIPTION STRING, @@ -402,7 +475,7 @@ CREATE TABLE products ( 'password' = 'flinkpw', 'database-name' = 'XE', 'schema-name' = 'inventory', - 'table-name' = 'products' + 'table-name' = 'inventory.products' ); ``` @@ -429,6 +502,36 @@ _注意: `scan.startup.mode` 选项的机制依赖于 Debezium 的`snapshot.mode Oracle Extract 节点不能并行读取,因为只有一个任务可以接收更改事件。 +### 整库、多模式、表同步 + +Oracle Extract 节点支持整库、多模式、多表同步。开启该功能后,Oracel Extract 节点会将表的物理字段压缩成 'canal-json' 格式的特殊元字段 'data_canal'。 + +配置参数: + +| 参数 | 是否必须 | 默认值 | 数据类型 | 描述 | +| ---| ---| ---| ---| ---| +|source.multiple.enable|optional| false|String| 指定`'source.multiple.enable' = 'true'`参数开启整库、多模式、多表同步功能 | +|schema-name|required|(none)|String| 要监视的 Oracle 数据库的 Schema 名称。如果要捕获多个模式,可以使用逗号分割它们。例如:`'schema-name' = 'SCHEMA1,SCHEMA2'` | +|table-name| required | (none) |String| 要监视的 Oracle 数据库的表名。如果要捕获多个表,可以使用逗号分割它们。例如:`'table-name' = 'SCHEMA1.TB.*, SCHEMA2.TB1'`| + +CREATE TABLE 示例演示该功能语法: + +```sql +CREATE TABLE node( + data STRING METADATA FROM 'meta.data_canal' VIRTUAL) + WITH ( + 'connector' = 'oracle-cdc-inlong', + 'hostname' = 'localhost', + 'port' = '1521', + 'username' = 'flinkuser', + 'password' = 'flinkpw', + 'database-name' = 'XE', + 'schema-name' = 'inventory', + 'table-name' = 'inventory..*', + 'source.multiple.enable' = 'true' +) +``` + ## 数据类型映射 <div class="wy-table-responsive">