This is an automated email from the ASF dual-hosted git repository. dockerzhang pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/inlong-website.git
The following commit(s) were added to refs/heads/master by this push: new 218862c866 [INLONG-537][Sort][Agent] Add TubeMQ connector docs and update Agent docs (#538) 218862c866 is described below commit 218862c866bcb6cbe1bdc20a56b04ca77d25e3ff Author: ganfengtan <ganfeng...@users.noreply.github.com> AuthorDate: Tue Sep 13 17:21:11 2022 +0800 [INLONG-537][Sort][Agent] Add TubeMQ connector docs and update Agent docs (#538) --- docs/data_node/extract_node/tube.md | 70 ++++++++++++++++++++++ docs/modules/agent/overview.md | 15 +++++ docs/modules/sort/quick_start.md | 3 +- .../current/data_node/extract_node/tube.md | 69 +++++++++++++++++++++ .../current/modules/agent/overview.md | 16 +++++ .../current/modules/sort/quick_start.md | 3 +- 6 files changed, 174 insertions(+), 2 deletions(-) diff --git a/docs/data_node/extract_node/tube.md b/docs/data_node/extract_node/tube.md new file mode 100644 index 0000000000..89ce32803d --- /dev/null +++ b/docs/data_node/extract_node/tube.md @@ -0,0 +1,70 @@ +--- +title: TubeMQ +sidebar_position: 11 +--- + +import {siteVariables} from '../../version'; + +## Overview + +[Apache InLong TubeMQ](https://inlong.apache.org/docs/modules/tubemq/overview) is a distributed, open source pub-sub messaging and steaming platform for real-time workloads, trillions of massive data precipitation. + +## Version + +| Extract Node | Version | +| --------------------- | ------------------------------------------------------------ | +| [TubeMQ](./tube.md) | [TubeMQ](https://inlong.apache.org/docs/next/modules/tubemq/overview): >=0.1.0<br/> | + +## Dependencies + +In order to set up the `TubeMQ Extract Node`, the following provides dependency information for both projects using a +build automation tool (such as Maven or SBT) and SQL Client with Sort Connectors JAR bundles. + +### Maven dependency + +<pre><code parentName="pre"> +{`<dependency> + <groupId>org.apache.inlong</groupId> + <artifactId>sort-connector-tubemq</artifactId> + <version>${siteVariables.inLongVersion}</version> +</dependency> +`} +</code></pre> + +## How to create a TubeMQ Extract Node + +### Usage for SQL API + +The example below shows how to create a TubeMQ Extract Node with `Flink SQL Cli` : +```sql +-- Create a TubeMQ table 'tube_extract_node' in Flink SQL Cli +Flink SQL> CREATE TABLE tube_extract_node ( + id INT, + name STRING, + age INT, + salary FLOAT + ) WITH ( + 'connector' = 'tubemq', + 'topic' = 'topicName', + 'masterRpc' = 'rpcUrl', -- 127.0.0.1:8715 + 'format' = 'json', + 'groupId' = 'groupName'); + +-- Read data from tube_extract_node +Flink SQL> SELECT * FROM tube_extract_node; +``` +### Usage for InLong Dashboard +TODO + +### Usage for InLong Manager Client +TODO + +## TubeMQ Extract Node Options + +| Parameter | Required | Default value | Type | Description | +| ----------------------------- | -------- | ------------- | ------ | ------------------------------------------------------------ | +| connector | required | tubemq | String | Set the connector type. Available options are `tubemq`. | +| topic | required | (none) | String | Set the input or output topic | +| masterRpc | required | (none) | String | Set the TubeMQ master service address. | +| format | required | (none) | String | TubeMQ message value serialization format, support JSON, Avro, etc. For more information, see the [Flink format](https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/connectors/table/formats/overview/). | +| groupId | required | (none) | String | Consumer group in TubeMQ | \ No newline at end of file diff --git a/docs/modules/agent/overview.md b/docs/modules/agent/overview.md index 2eccf4b734..7bc65e59e5 100644 --- a/docs/modules/agent/overview.md +++ b/docs/modules/agent/overview.md @@ -25,6 +25,21 @@ User-configured path monitoring, able to monitor the created file information Directory regular filtering, support YYYYMMDD+regular expression path configuration Breakpoint retransmission, when InLong-Agent restarts, it can automatically re-read from the last read position to ensure no reread or missed reading. +#### File options +| Parameter | Required | Default value | Type | Description | +| ----------------------------- | -------- | ------------- | ------ | ------------------------------------------------------------ | +| pattern | required | (none) | String | File pattern. For example: /root/[*].log | +| timeOffset | optional | (none) | String | File name includes time, for example: *** YYYYMMDDHH *** YYYY represents the year, MM represents the month, DD represents the day, and HH represents the hour, *** is any character. '1m' means one minute after, '-1m' means one minute before. '1h' means one hour after, '-1h' means one hour before. '1d' means one day after, '-1d' means one day before.| +| collectType | optional | FULL | String | FULL is all file. INCREMENT is the newly created file after start task. | +| lineEndPattern | optional | '\n' | String | Line of file end pattern. | +| contentCollectType | optional | FULL | String | Collect data of file content. | +| envList | optional | (none) | String | File needs to collect environment information, for example: kubernetes. | +| dataContentStyle | optional | (none) | String | Type of data result for column separator. Json format, set this parameter to json. CSV format, set this parameter to a custom separator: `,` | `:` | +| dataSeparator | optional | (none) | String | Column separator of data source. | +| monitorStatus | optional | (none) | Integer| Monitor switch, 1 true and 0 false. Batch data is 0,real time data is 1. | +| monitorInterval | optional | (none) | Long | Monitor interval for file. | +| monitorExpire | optional | (none) | Long | Monitor expire time and the time in milliseconds. | + ### SQL This type of data refers to the way it is executed through SQL SQL regular decomposition, converted into multiple SQL statements diff --git a/docs/modules/sort/quick_start.md b/docs/modules/sort/quick_start.md index cc1bb93592..93994bc7c2 100644 --- a/docs/modules/sort/quick_start.md +++ b/docs/modules/sort/quick_start.md @@ -14,7 +14,8 @@ Currently, InLong Sort relies on Flink-1.13.5. Chose `flink-1.13.5-bin-scala_2.1 - InLong Sort file, [Download](https://inlong.apache.org/download/) `apache-inlong-[version]-bin.tar.gz` - Data Nodes Connectors, [Download](https://inlong.apache.org/download/) `apache-inlong-[version]-sort-connectors.tar.gz` -Notice: Please put required Connectors jars into under `FLINK_HOME/lib/` after download. +Notice: Please put required Connectors jars into under `FLINK_HOME/lib/` after download. +Put [mysql-connector-java:8.0.21.jar](https://repo1.maven.org/maven2/mysql/mysql-connector-java/8.0.21/mysql-connector-java-8.0.21.jar) to `FLINK_HOME/lib/` when you use `mysql-cdc-inlong` connector. ## Start an inlong-sort application ```shell diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/extract_node/tube.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/extract_node/tube.md new file mode 100644 index 0000000000..d09c83ed88 --- /dev/null +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/extract_node/tube.md @@ -0,0 +1,69 @@ +--- +title: TubeMQ +sidebar_position: 11 +--- + +import {siteVariables} from '../../version'; + +## 概述 + +[Apache InLong TubeMQ](https://inlong.apache.org/docs/modules/tubemq/overview) 是一个分布式、开源的 pub-sub 消息传递和流平台, 适合于万亿规模数据。 + +## 版本 + +| 抽取节点 | 版本 | +| --------------------- | ------------------------------------------------------------ | +| [TubeMQ](./tube.md) | [TubeMQ](https://inlong.apache.org/docs/next/modules/tubemq/overview): >=0.1.0<br/> | + +## 依赖项 + +为了设置 TubeMQ Extract 节点,下面提供了使用构建自动化工具(例如 Maven 或 SBT)和带有 Sort Connectors JAR 包的 SQL 客户端的两个项目的依赖关系信息。 + +### Maven 依赖 + +<pre><code parentName="pre"> +{`<dependency> + <groupId>org.apache.inlong</groupId> + <artifactId>sort-connector-tubemq</artifactId> + <version>${siteVariables.inLongVersion}</version> +</dependency> +`} +</code></pre> + +## 如何创建TubeMQ抽取节点 + +### SQL API 的使用 + +使用 `Flink SQL Cli` : +```sql +-- Create a TubeMQ table 'tube_extract_node' in Flink SQL Cli +Flink SQL> CREATE TABLE tube_extract_node ( + id INT, + name STRING, + age INT, + salary FLOAT + ) WITH ( + 'connector' = 'tubemq', + 'topic' = 'topicName', + 'masterRpc' = 'rpcUrl', -- 127.0.0.1:8715 + 'format' = 'json', + 'groupId' = 'groupName'); + +-- Read data from tube_extract_node +Flink SQL> SELECT * FROM tube_extract_node; +``` +### InLong Dashboard 方式 +TODO + +### InLong Manager Client 方式 +TODO + +## TubeMQ 抽取节点参数信息 + +| 参数 | 是否必须 | 默认值 | 数据类型 | 描述 | +| ----------------------------- | -------- | ------------- | ------ | ------------------------------------------------------------ | +| connector | required | tubemq | String | 设置连接器类型 `tubemq` | +| topic | required | (none) | String | 设置抽取的topic | +| masterRpc | required | (none) | String | 设置TubeMQ master service 地址 | +| format | required | (none) | String | TubeMQ 数据类型, 支持 JSON, Avro, etc. For more information, see the [Flink format](https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/connectors/table/formats/overview/). | +| groupId | required | (none) | String | TubeMQ 消费组 | \ No newline at end of file diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/agent/overview.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/agent/overview.md index d02e5f4504..6cd8fb5431 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/agent/overview.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/agent/overview.md @@ -25,6 +25,22 @@ InLong Agent本身作为数据采集框架,采用channel + plugin架构构建 目录正则过滤,支持YYYYMMDD+正则表达式的路径配置 断点重传,InLong-Agent重启时,能够支持自动从上次读取位置重新读取,保证不重读不漏读。\ +#### 文件采集参数 +| 参数 | 是否必须 | 默认值 | 类型 | 描述 | +| ----------------------------- | -------- | ------------- | ------ | ------------------------------------------------------------ | +| pattern | required | (none) | String | 文件正则匹配,例如: /root/[*].log | +| timeOffset | optional | (none) | String | 文件偏移匹配针对文件文件名称为: *** YYYYMMDDHH *** YYYY 表示年, MM 表示月, DD 表示天, HH 表示小时, *** 表示任意的字符;'1m' 表示一分钟以后, '-1m' 表示一分钟以前, '1h' 一小时以后, '-1h' 一小时以前, '1d' 一天以后, '-1d' 一天以前。| +| collectType | optional | FULL | String | "FULL" 目录下所有匹配的文件, "INCREMENT" 任务启动后匹配新增的文件。 | +| lineEndPattern | optional | '\n' | String | 文件行结束正则匹配。 | +| contentCollectType | optional | FULL | String | 文件内容采集方式全量"FULL"、增量"INCREMENT" 。| +| envList | optional | (none) | String | 文件采集携带环境信息,例如在容器环境下: kubernetes 。 | +| dataContentStyle | optional | (none) | String | 采集后数据输出方式, Json 格式设置为 json ; CSV 格式设置分割类型: `,` | `:` | +| dataSeparator | optional | (none) | String | 文件数据原始列分割方式。 | +| monitorStatus | optional | (none) | Integer| 文件监控开关 1 开启 、 0 关闭。场景:在批量采集是设置为 0,实时数据采集时 1。 | +| monitorInterval | optional | (none) | Long | 文件监控探测频率,毫秒/单位 | +| monitorExpire | optional | (none) | Long | 文件监控探测过期时间,毫秒/单位 | + + ### Sql 这类数据是指通过SQL执行的方式 SQL正则分解,转化成多条SQL语句 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/sort/quick_start.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/sort/quick_start.md index d64f66c27e..6eca26e5bc 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/sort/quick_start.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/sort/quick_start.md @@ -12,7 +12,8 @@ sidebar_position: 2 - InLong Sort 运行文件,[下载](https://inlong.apache.org/zh-CN/download/) `apache-inlong-[version]-bin.tar.gz` - 数据节点 Connectors,[下载](https://inlong.apache.org/zh-CN/download/) `apache-inlong-[version]-sort-connectors.tar.gz` -注意:Connectors 下载后可以将需要的 jars 放到`FLINK_HOME/lib/`下。 +注意:Connectors 下载后可以将需要的 jars 放到`FLINK_HOME/lib/`下。 +如果使用`mysql-cdc-inlong` 连接器,请将 [mysql-connector-java:8.0.21.jar](https://repo1.maven.org/maven2/mysql/mysql-connector-java/8.0.21/mysql-connector-java-8.0.21.jar) 包放到 `FLINK_HOME/lib/`下。 ## 启动 InLong Sort ```