davidzollo commented on code in PR #8884: URL: https://github.com/apache/seatunnel/pull/8884#discussion_r1981404884
########## docs/zh/connector-v2/source/MySQL-CDC.md: ########## @@ -0,0 +1,342 @@ +# MySQL CDC + +> MySQL CDC 源连接器 + +## 支持这些引擎 + +> SeaTunnel Zeta<br/> +> Flink <br/> + +## 描述 + +MySQL CDC连接器允许从MySQL数据库读取快照和增量数据. 本文档描述了如何设置MySQL CDC连接器以针对MySQL数据库运行SQL查询. + +## 主要功能 + +- [ ] [批处理](../../concept/connector-v2-features.md) +- [x] [流处理](../../concept/connector-v2-features.md) +- [x] [精确一次](../../concept/connector-v2-features.md) +- [ ] [列投影](../../concept/connector-v2-features.md) +- [x] [并行度](../../concept/connector-v2-features.md) +- [x] [支持自定义分片](../../concept/connector-v2-features.md) + +## 支持的数据源信息 + +| 数据源 | 支持的版本 | 驱动 | Url | Maven | +|-------|------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------|----------------------------------|----------------------------------------------------------------------| +| MySQL | <li> [MySQL](https://dev.mysql.com/doc): 5.5, 5.6, 5.7, 8.0.x </li><li> [RDS MySQL](https://www.aliyun.com/product/rds/mysql): 5.6, 5.7, 8.0.x </li> | com.mysql.cj.jdbc.Driver | jdbc:mysql://localhost:3306/test | https://mvnrepository.com/artifact/mysql/mysql-connector-java/8.0.28 | + +## 依赖 + +### 安装Jdbc驱动 + +#### 对于Flink引擎 + +> 1. 你需要确保 [jdbc 驱动 jar 包](https://mvnrepository.com/artifact/mysql/mysql-connector-java) 已经放在目录 `${SEATUNNEL_HOME}/plugins/`. + +#### 对于SeaTunnel Zeta引擎 + +> 1. 你需要确保 [jdbc 驱动 jar 包](https://mvnrepository.com/artifact/mysql/mysql-connector-java) 已经放在目录 `${SEATUNNEL_HOME}/lib/`. + +### 创建MySQL用户 + +您必须定义一个MySQL用户,该用户对Debezium MySQL连接器所监控的所有数据库拥有适当的权限. + +1. 创建MySQL用户: + +```sql +mysql> CREATE USER 'user'@'localhost' IDENTIFIED BY 'password'; +``` + +2. 给用户赋予所需权限: + +```sql +mysql> GRANT SELECT, RELOAD, SHOW DATABASES, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'user' IDENTIFIED BY 'password'; +``` + +3. 最终确定用户权限: + +```sql +mysql> FLUSH PRIVILEGES; +``` + +### 启用MySQL Binlog + +一定要为MySQL复制启用二进制日志。二进制日志记录事务更新以供复制工具传播更改. + +1. 检查`log-bin`是否已经设置为on: + +```sql +mysql> show variables where variable_name in ('log_bin', 'binlog_format', 'binlog_row_image', 'gtid_mode', 'enforce_gtid_consistency'); ++--------------------------+----------------+ +| Variable_name | Value | ++--------------------------+----------------+ +| binlog_format | ROW | +| binlog_row_image | FULL | +| enforce_gtid_consistency | ON | +| gtid_mode | ON | +| log_bin | ON | ++--------------------------+----------------+ +5 rows in set (0.00 sec) +``` + +2. 如果以上结果不一致, 配置你的MySQL server配置文件(`$MYSQL_HOME/mysql.cnf`) ,配置文件中包含以下属性,这些属性在以下表格中有描述: + +``` +# Enable binary replication log and set the prefix, expiration, and log format. +# The prefix is arbitrary, expiration can be short for integration tests but would +# be longer on a production system. Row-level info is required for ingest to work. +# Server ID is required, but this will vary on production systems +server-id = 223344 +log_bin = mysql-bin +expire_logs_days = 10 +binlog_format = row +# mysql 5.6+ requires binlog_row_image to be set to FULL +binlog_row_image = FULL + +# enable gtid mode +# mysql 5.6+ requires gtid_mode to be set to ON +gtid_mode = on +enforce_gtid_consistency = on +``` + +3. 重启MySQL Server + +```shell +/etc/inint.d/mysqld restart +``` + +4. 修改之后再检查一次binlog的状态: + +MySQL 5.5: + +```sql +mysql> show variables where variable_name in ('log_bin', 'binlog_format', 'binlog_row_image', 'gtid_mode', 'enforce_gtid_consistency'); ++--------------------------+----------------+ +| Variable_name | Value | ++--------------------------+----------------+ +| binlog_format | ROW | +| log_bin | ON | ++--------------------------+----------------+ +5 rows in set (0.00 sec) +``` + +MySQL 5.6+: + +```sql +mysql> show variables where variable_name in ('log_bin', 'binlog_format', 'binlog_row_image', 'gtid_mode', 'enforce_gtid_consistency'); ++--------------------------+----------------+ +| Variable_name | Value | ++--------------------------+----------------+ +| binlog_format | ROW | +| binlog_row_image | FULL | +| enforce_gtid_consistency | ON | +| gtid_mode | ON | +| log_bin | ON | ++--------------------------+----------------+ +5 rows in set (0.00 sec) +``` + +### 提示 + +#### 配置MySQL session超时时长 + +当为量大的数据库初始一致快照时,在读取表的过程中,您已建立的连接可能会超时。您可以通过在MySQL配置文件中配置interactive_timeout(交互超时时间)和wait_timeout(等待超时时间)来防止这种行为. +- `interactive_timeout`: 服务器在关闭交互连接之前等待活动(交互操作)的秒数. 详见[MySQL’s documentation](https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_interactive_timeout). +- `wait_timeout`: 服务器在关闭非交互式连接之前等待其活动的秒数. 详见[MySQL’s documentation](https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_wait_timeout). + +* 更多的数据库配置,见 [Debezium MySQL Connector](https://github.com/debezium/debezium/blob/v1.9.8.Final/documentation/modules/ROOT/pages/connectors/mysql.adoc#setting-up-mysql)* + +## 数据类型映射 + +| Mysql数据类型 | SeaTunnel数据类型 | +|------------------------------------------------------------------------------------------------|---------------| +| BIT(1)<br/>TINYINT(1) | BOOLEAN | +| TINYINT | TINYINT | +| TINYINT UNSIGNED<br/>SMALLINT | SMALLINT | +| SMALLINT UNSIGNED<br/>MEDIUMINT<br/>MEDIUMINT UNSIGNED<br/>INT<br/>INTEGER<br/>YEAR | INT | +| INT UNSIGNED<br/>INTEGER UNSIGNED<br/>BIGINT | BIGINT | +| BIGINT UNSIGNED | DECIMAL(20,0) | +| DECIMAL(p, s) <br/>DECIMAL(p, s) UNSIGNED <br/>NUMERIC(p, s) <br/>NUMERIC(p, s) UNSIGNED | DECIMAL(p,s) | +| FLOAT<br/>FLOAT UNSIGNED | FLOAT | +| DOUBLE<br/>DOUBLE UNSIGNED<br/>REAL<br/>REAL UNSIGNED | DOUBLE | +| CHAR<br/>VARCHAR<br/>TINYTEXT<br/>MEDIUMTEXT<br/>TEXT<br/>LONGTEXT<br/>ENUM<br/>JSON<br/>ENUM | STRING | +| DATE | DATE | +| TIME(s) | TIME(s) | +| DATETIME<br/>TIMESTAMP(s) | TIMESTAMP(s) | +| BINARY<br/>VARBINAR<br/>BIT(p)<br/>TINYBLOB<br/>MEDIUMBLOB<br/>BLOB<br/>LONGBLOB <br/>GEOMETRY | BYTES | + +## 配置参数选项 + +| 参数名称 | 类型 | 是否必须 | 默认值 | 描述 | +|------------------------------------------------|----------|----------|---------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| base-url | String | Yes | - | JDBC连接的URL. 例如: `jdbc:mysql://localhost:3306:3306/test`. | +| username | String | Yes | - | 用来连接到数据库服务的数据库名称. | +| password | String | Yes | - | 连接到数据库服务所使用的密码. | +| database-names | List | No | - | 要监控的数据库名称. | +| database-pattern | String | No | .* | 要捕获的数据库名称的正则表达式, 例如: `database_prefix.*`. | +| table-names | List | Yes | - | 要监控的表名. 表名需要包括库名, 例如: `database_name.table_name` | +| table-pattern | String | Yes | - | 要捕获的表名称的正则表达式. 表名需要包括库名, 例如: `database.*\\.table_.*` | +| table-names-config | List | No | - | 表配置的列表集合. 例如: [{"table": "db1.schema1.table1","primaryKeys": ["key1"],"snapshotSplitColumn": "key2"}] | +| startup.mode | Enum | No | INITIAL | MySQL CDC 消费者的可选启动模式, 有效枚举值为 `initial`, `earliest`, `latest` 和 `specific`. <br/> `initial`: 启动时同步历史数据, 然后同步增量数据.<br/> `earliest`: 从尽可能最早的偏移量开始启动.<br/> `latest`: 从最近的偏移量启动.<br/> `specific`: 从用户提供的特定偏移量开始启动. | +| startup.specific-offset.file | String | No | - | 从指定的binlog日志文件名开始. **注意, 当使用 `startup.mode` 选项为 `specific` 时,此选项为必填项.** | +| startup.specific-offset.pos | Long | No | - | 从指定的binlog日志文件位置开始. **注意, 当使用 `startup.mode` 选项为 `specific` 时,此选项为必填项.** | +| stop.mode | Enum | No | NEVER | MySQL CDC 消费者的可选停止模式, 有效枚举值为 `never`, `latest` 和 `specific`. <br/> `never`: 实时任务一直运行不停止.<br/> `latest`: 从最新的偏移量处停止.<br/> `specific`: 从用户提供的特定偏移量处停止. | Review Comment: Please update Yes -> `是` and No -> `否` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org