MOBIN-F commented on code in PR #4033: URL: https://github.com/apache/flink-cdc/pull/4033#discussion_r2123339621
########## docs/content.zh/docs/get-started/quickstart/mysql-to-kafka.md: ########## @@ -0,0 +1,588 @@ +--- +title: "MySQL 同步到 Kafka" +weight: 2 +type: docs +aliases: +- /try-flink-cdc/pipeline-connectors/mysql-Kafka-pipeline-tutorial.html +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# Streaming ELT 同步 MySQL 到 Kafka + +这篇教程将展示如何基于 Flink CDC 快速构建 MySQL 到 Kafka 的 Streaming ELT 作业,包含整库同步、表结构变更同步和分库分表同步的功能。 +本教程的演示都将在 Flink CDC CLI 中进行,无需一行 Java/Scala 代码,也无需安装 IDE。 + +## 准备阶段 +准备一台已经安装了 Docker 的 Linux 或者 MacOS 电脑。 + +### 准备 Flink Standalone 集群 +1. 下载 [Flink 1.20.1](https://archive.apache.org/dist/flink/flink-1.20.1/flink-1.20.1-bin-scala_2.12.tgz) ,解压后得到 flink-1.20.1 目录。 + 使用下面的命令跳转至 Flink 目录下,并且设置 FLINK_HOME 为 flink-1.20.1 所在目录。 + + ```shell + tar -zxvf flink-1.20.1-bin-scala_2.12.tgz + exprot FLINK_HOME=$(pwd)/flink-1.20.1 + cd flink-1.20.1 + ``` + +2. 通过在 conf/flink-conf.yaml 配置文件追加下列参数开启 checkpoint,每隔 3 秒做一次 checkpoint。 + + ```yaml + execution: + checkpointing: + interval: 3000 + ``` + +3. 使用下面的命令启动 Flink 集群。 + + ```shell + ./bin/start-cluster.sh + ``` + +启动成功的话,可以在 [http://localhost:8081/](http://localhost:8081/) 访问到 Flink Web UI,如下所示: + +{{< img src="/fig/mysql-Kafka-tutorial/flink-ui.png" alt="Flink UI" >}} + +多次执行 start-cluster.sh 可以拉起多个 TaskManager。 +注:如果你是云服务器,无法访问本地,需要将 conf/config.yaml 里面 rest.bind-address 和 rest.address的 localhost 改成0.0.0.0,然后使用 公网IP:8081 即可访问。 + +### 准备 Docker 环境 +使用下面的内容创建一个 `docker-compose.yml` 文件: + + ```yaml + version: '2.1' + services: + Zookeeper: Review Comment: Unified indentation style? ########## docs/content.zh/docs/get-started/quickstart/mysql-to-kafka.md: ########## @@ -0,0 +1,588 @@ +--- +title: "MySQL 同步到 Kafka" +weight: 2 +type: docs +aliases: +- /try-flink-cdc/pipeline-connectors/mysql-Kafka-pipeline-tutorial.html +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# Streaming ELT 同步 MySQL 到 Kafka + +这篇教程将展示如何基于 Flink CDC 快速构建 MySQL 到 Kafka 的 Streaming ELT 作业,包含整库同步、表结构变更同步和分库分表同步的功能。 +本教程的演示都将在 Flink CDC CLI 中进行,无需一行 Java/Scala 代码,也无需安装 IDE。 + +## 准备阶段 +准备一台已经安装了 Docker 的 Linux 或者 MacOS 电脑。 + +### 准备 Flink Standalone 集群 +1. 下载 [Flink 1.20.1](https://archive.apache.org/dist/flink/flink-1.20.1/flink-1.20.1-bin-scala_2.12.tgz) ,解压后得到 flink-1.20.1 目录。 + 使用下面的命令跳转至 Flink 目录下,并且设置 FLINK_HOME 为 flink-1.20.1 所在目录。 + + ```shell + tar -zxvf flink-1.20.1-bin-scala_2.12.tgz + exprot FLINK_HOME=$(pwd)/flink-1.20.1 + cd flink-1.20.1 + ``` + +2. 通过在 conf/flink-conf.yaml 配置文件追加下列参数开启 checkpoint,每隔 3 秒做一次 checkpoint。 + + ```yaml + execution: + checkpointing: + interval: 3000 + ``` + +3. 使用下面的命令启动 Flink 集群。 + + ```shell + ./bin/start-cluster.sh + ``` + +启动成功的话,可以在 [http://localhost:8081/](http://localhost:8081/) 访问到 Flink Web UI,如下所示: + +{{< img src="/fig/mysql-Kafka-tutorial/flink-ui.png" alt="Flink UI" >}} + +多次执行 start-cluster.sh 可以拉起多个 TaskManager。 +注:如果你是云服务器,无法访问本地,需要将 conf/config.yaml 里面 rest.bind-address 和 rest.address的 localhost 改成0.0.0.0,然后使用 公网IP:8081 即可访问。 + +### 准备 Docker 环境 +使用下面的内容创建一个 `docker-compose.yml` 文件: + + ```yaml + version: '2.1' + services: + Zookeeper: + image: zookeeper:3.7.1 + ports: + - "2181:2181" + environment: + - ALLOW_ANONYMOUS_LOGIN=yes + Kafka: + image: bitnami/kafka:2.8.1 + ports: + - "9092:9092" + - "9093:9093" + environment: + - ALLOW_PLAINTEXT_LISTENER=yes + - KAFKA_LISTENERS=PLAINTEXT://:9092 + - KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://192.168.67.2:9092 + - KAFKA_ZOOKEEPER_CONNECT=192.168.67.2:2181 + MySQL: + image: debezium/example-mysql:1.1 + ports: + - "3306:3306" + environment: + - MYSQL_ROOT_PASSWORD=123456 + - MYSQL_USER=mysqluser + - MYSQL_PASSWORD=mysqlpw + ``` +注意:文件里面的 192.168.67.2 为内网 IP,可通过 ifconfig 查找。 +该 Docker Compose 中包含的容器有: +- MySQL: 包含商品信息的数据库 `app_db` +- Kafka: 存储从 MySQL 中根据规则映射过来的结果表 +- Zookeeper:主要用于进行Kafka集群管理和协调 + +在 `docker-compose.yml` 所在目录下执行下面的命令来启动本教程需要的组件: + + ```shell + docker-compose up -d + ``` + +该命令将以 detached 模式自动启动 Docker Compose 配置中定义的所有容器。你可以通过 docker ps 来观察上述的容器是否正常启动了,如下所示。 +{{< img src="/fig/mysql-Kafka-tutorial/docker-ps.png" alt="Docker ps" >}} +#### 在 MySQL 数据库中准备数据 +1. 进入 MySQL 容器 + + ```shell + docker-compose exec MySQL mysql -uroot -p123456 + ``` + +2. 创建数据库 `app_db` 和表 `orders`,`products`,`shipments`,并插入数据 + + ```sql + -- 创建数据库 + CREATE DATABASE app_db; + + USE app_db; + + -- 创建 orders 表 + CREATE TABLE `orders` ( + `id` INT NOT NULL, + `price` DECIMAL(10,2) NOT NULL, + PRIMARY KEY (`id`) + ); + + -- 插入数据 + INSERT INTO `orders` (`id`, `price`) VALUES (1, 4.00); + INSERT INTO `orders` (`id`, `price`) VALUES (2, 100.00); + + -- 创建 shipments 表 + CREATE TABLE `shipments` ( + `id` INT NOT NULL, + `city` VARCHAR(255) NOT NULL, + PRIMARY KEY (`id`) + ); + + -- 插入数据 + INSERT INTO `shipments` (`id`, `city`) VALUES (1, 'beijing'); + INSERT INTO `shipments` (`id`, `city`) VALUES (2, 'xian'); + + -- 创建 products 表 + CREATE TABLE `products` ( + `id` INT NOT NULL, + `product` VARCHAR(255) NOT NULL, + PRIMARY KEY (`id`) + ); + + -- 插入数据 + INSERT INTO `products` (`id`, `product`) VALUES (1, 'Beer'); + INSERT INTO `products` (`id`, `product`) VALUES (2, 'Cap'); + INSERT INTO `products` (`id`, `product`) VALUES (3, 'Peanut'); + ``` + +## 通过 Flink CDC CLI 提交任务 +1. 下载下面列出的二进制压缩包,并解压得到目录 `flink-cdc-{{< param Version >}}`; + [flink-cdc-{{< param Version >}}-bin.tar.gz](https://www.apache.org/dyn/closer.lua/flink/flink-cdc-{{< param Version >}}/flink-cdc-{{< param Version >}}-bin.tar.gz) + flink-cdc-{{< param Version >}} 下会包含 `bin`、`lib`、`log`、`conf` 四个目录。 + +2. 下载下面列出的 connector 包,并且移动到 lib 目录下; + **下载链接只对已发布的版本有效, SNAPSHOT 版本需要本地基于 master 或 release- 分支编译。** + **请注意,您需要将 jar 移动到 Flink CDC Home 的 lib 目录,而非 Flink Home 的 lib 目录下。** + - [MySQL pipeline connector {{< param Version >}}](https://repo1.maven.org/maven2/org/apache/flink/flink-cdc-pipeline-connector-mysql/{{< param Version >}}/flink-cdc-pipeline-connector-mysql-{{< param Version >}}.jar) + - [Kafka pipeline connector {{< param Version >}}](https://repo1.maven.org/maven2/org/apache/flink/flink-cdc-pipeline-connector-kafka/{{< param Version >}}/flink-cdc-pipeline-connector-kafka-{{< param Version >}}.jar) + + 您还需要将下面的 Driver 包放在 Flink `lib` 目录下,或通过 `--jar` 参数将其传入 Flink CDC CLI,因为 CDC Connectors 不再包含这些 Drivers: + - [MySQL Connector Java](https://repo1.maven.org/maven2/mysql/mysql-connector-java/8.0.27/mysql-connector-java-8.0.27.jar) + +3. 编写任务配置 yaml 文件。 + 下面给出了一个整库同步的示例文件 mysql-to-Kafka.yaml: + + ```yaml + ################################################################################ + # Description: Sync MySQL all tables to Kafka + ################################################################################ + source: + type: mysql + hostname: 0.0.0.0 + port: 3306 + username: root + password: 123456 + tables: app_db.\.* + server-id: 5400-5404 + server-time-zone: UTC + + sink: + type: kafka + name: Kafka Sink + properties.bootstrap.servers: 0.0.0.0:9092 + topic: yaml-mysql-kafka + + + pipeline: + name: MySQL to Kafka Pipeline + parallelism: 1 + + + ``` Review Comment: Remove extra blank lines ########## docs/content.zh/docs/get-started/quickstart/mysql-to-kafka.md: ########## @@ -0,0 +1,588 @@ +--- +title: "MySQL 同步到 Kafka" +weight: 2 +type: docs +aliases: +- /try-flink-cdc/pipeline-connectors/mysql-Kafka-pipeline-tutorial.html +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# Streaming ELT 同步 MySQL 到 Kafka + +这篇教程将展示如何基于 Flink CDC 快速构建 MySQL 到 Kafka 的 Streaming ELT 作业,包含整库同步、表结构变更同步和分库分表同步的功能。 +本教程的演示都将在 Flink CDC CLI 中进行,无需一行 Java/Scala 代码,也无需安装 IDE。 + +## 准备阶段 +准备一台已经安装了 Docker 的 Linux 或者 MacOS 电脑。 + +### 准备 Flink Standalone 集群 +1. 下载 [Flink 1.20.1](https://archive.apache.org/dist/flink/flink-1.20.1/flink-1.20.1-bin-scala_2.12.tgz) ,解压后得到 flink-1.20.1 目录。 + 使用下面的命令跳转至 Flink 目录下,并且设置 FLINK_HOME 为 flink-1.20.1 所在目录。 + + ```shell + tar -zxvf flink-1.20.1-bin-scala_2.12.tgz + exprot FLINK_HOME=$(pwd)/flink-1.20.1 + cd flink-1.20.1 + ``` + +2. 通过在 conf/flink-conf.yaml 配置文件追加下列参数开启 checkpoint,每隔 3 秒做一次 checkpoint。 + + ```yaml + execution: + checkpointing: + interval: 3000 + ``` + +3. 使用下面的命令启动 Flink 集群。 + + ```shell + ./bin/start-cluster.sh + ``` + +启动成功的话,可以在 [http://localhost:8081/](http://localhost:8081/) 访问到 Flink Web UI,如下所示: + +{{< img src="/fig/mysql-Kafka-tutorial/flink-ui.png" alt="Flink UI" >}} + +多次执行 start-cluster.sh 可以拉起多个 TaskManager。 +注:如果你是云服务器,无法访问本地,需要将 conf/config.yaml 里面 rest.bind-address 和 rest.address的 localhost 改成0.0.0.0,然后使用 公网IP:8081 即可访问。 + +### 准备 Docker 环境 +使用下面的内容创建一个 `docker-compose.yml` 文件: + + ```yaml + version: '2.1' + services: + Zookeeper: + image: zookeeper:3.7.1 + ports: + - "2181:2181" + environment: + - ALLOW_ANONYMOUS_LOGIN=yes + Kafka: + image: bitnami/kafka:2.8.1 + ports: + - "9092:9092" + - "9093:9093" + environment: + - ALLOW_PLAINTEXT_LISTENER=yes + - KAFKA_LISTENERS=PLAINTEXT://:9092 + - KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://192.168.67.2:9092 + - KAFKA_ZOOKEEPER_CONNECT=192.168.67.2:2181 + MySQL: + image: debezium/example-mysql:1.1 + ports: + - "3306:3306" + environment: + - MYSQL_ROOT_PASSWORD=123456 + - MYSQL_USER=mysqluser + - MYSQL_PASSWORD=mysqlpw + ``` +注意:文件里面的 192.168.67.2 为内网 IP,可通过 ifconfig 查找。 Review Comment: 192.168.67.2 -> kafka or zookeeper? In the same docker network, we can use service names to communicate directly -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org