This is an automated email from the ASF dual-hosted git repository. wanghailin pushed a commit to branch dev in repository https://gitbox.apache.org/repos/asf/seatunnel.git
The following commit(s) were added to refs/heads/dev by this push: new e5da7e840d [Feature][doc][Connector-V2][Common] Add Common connector documentation (#5453) e5da7e840d is described below commit e5da7e840d67f0e845166d2ab8c9c70afe143cdb Author: ZhilinLi <zhilinli0...@gmail.com> AuthorDate: Sat Jun 15 10:02:13 2024 +0800 [Feature][doc][Connector-V2][Common] Add Common connector documentation (#5453) --- docs/en/connector-v2/sink/common-options.md | 21 +++----- docs/en/connector-v2/source/common-options.md | 78 +++++++++++++++++++++------ docs/en/transform-v2/common-options.md | 66 ++++++++++++++++++----- 3 files changed, 125 insertions(+), 40 deletions(-) diff --git a/docs/en/connector-v2/sink/common-options.md b/docs/en/connector-v2/sink/common-options.md index 2addc49278..bfcdc26a2b 100644 --- a/docs/en/connector-v2/sink/common-options.md +++ b/docs/en/connector-v2/sink/common-options.md @@ -2,24 +2,19 @@ > Common parameters of sink connectors -| name | type | required | default value | -|-------------------|--------|----------|---------------| -| source_table_name | string | no | - | -| parallelism | int | no | - | +| Name | Type | Required | Default | Description | +|-------------------|--------|----------|---------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| source_table_name | String | No | - | When `source_table_name` is not specified, the current plug-in processes the data set `dataset` output by the previous plugin in the configuration file <br/> When `source_table_name` is specified, the current plug-in is processing the data set corresponding to this parameter. | -### source_table_name [string] +# Important note -When `source_table_name` is not specified, the current plug-in processes the data set `dataset` output by the previous plugin in the configuration file; +When the job configuration `source_table_name` you must set the `result_table_name` parameter -When `source_table_name` is specified, the current plug-in is processing the data set corresponding to this parameter. +## Task Example -### parallelism [int] +### Simple: -When `parallelism` is not specified, the `parallelism` in env is used by default. - -When parallelism is specified, it will override the parallelism in env. - -## Examples +> This is the process of passing a data source through two transforms and returning two different pipiles to different sinks ```bash source { diff --git a/docs/en/connector-v2/source/common-options.md b/docs/en/connector-v2/source/common-options.md index a9e607b28e..079f40663a 100644 --- a/docs/en/connector-v2/source/common-options.md +++ b/docs/en/connector-v2/source/common-options.md @@ -2,32 +2,80 @@ > Common parameters of source connectors -| name | type | required | default value | -|-------------------|--------|----------|---------------| -| result_table_name | string | no | - | -| parallelism | int | no | - | +| Name | Type | Required | Default | Description [...] +|-------------------|--------|----------|---------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- [...] +| result_table_name | String | No | - | When `result_table_name` is not specified, the data processed by this plugin will not be registered as a data set `(dataStream/dataset)` that can be directly accessed by other plugins, or called a temporary table `(table)` <br/>When `result_table_name` is specified, the data processed by this plugin will be registered as a data set `(dataStream/dataset)` that can be directly accessed by other plugins, or called a temporary table `(table [...] +| parallelism | Int | No | - | When `parallelism` is not specified, the `parallelism` in env is used by default. <br/>When parallelism is specified, it will override the parallelism in env. [...] -### result_table_name [string] +# Important note -When `result_table_name` is not specified, the data processed by this plugin will not be registered as a data set `(dataStream/dataset)` that can be directly accessed by other plugins, or called a temporary table `(table)` ; +When the job configuration `result_table_name` you must set the `source_table_name` parameter -When `result_table_name` is specified, the data processed by this plugin will be registered as a data set `(dataStream/dataset)` that can be directly accessed by other plugins, or called a temporary table `(table)` . The data set `(dataStream/dataset)` registered here can be directly accessed by other plugins by specifying `source_table_name` . +## Task Example -### parallelism [int] +### Simple: -When `parallelism` is not specified, the `parallelism` in env is used by default. - -When parallelism is specified, it will override the parallelism in env. - -## Example +> This registers a stream or batch data source and returns the table name `fake_table` at registration ```bash source { FakeSourceStream { - result_table_name = "fake" + result_table_name = "fake_table" } } ``` -> The result of the data source `FakeSourceStream` will be registered as a temporary table named `fake` . This temporary table can be used by any `Transform` or `Sink` plugin by specifying `source_table_name` . +### Multiple Pipeline Simple + +> This is to convert the data source fake and write it to two different sinks + +```bash +env { + job.mode = "BATCH" +} + +source { + FakeSource { + result_table_name = "fake" + row.num = 100 + schema = { + fields { + id = "int" + name = "string" + age = "int" + c_timestamp = "timestamp" + c_date = "date" + c_map = "map<string, string>" + c_array = "array<int>" + c_decimal = "decimal(30, 8)" + c_row = { + c_row = { + c_int = int + } + } + } + } + } +} + +transform { + Sql { + source_table_name = "fake" + result_table_name = "fake1" + # the query table name must same as field 'source_table_name' + query = "select id, regexp_replace(name, '.+', 'b') as name, age+1 as age, pi() as pi, c_timestamp, c_date, c_map, c_array, c_decimal, c_row from fake" + } + # The SQL transform support base function and criteria operation + # But the complex SQL unsupported yet, include: multi source table/rows JOIN and AGGREGATE operation and the like +} + +sink { + Console { + source_table_name = "fake1" + } + Console { + source_table_name = "fake" + } +} +``` diff --git a/docs/en/transform-v2/common-options.md b/docs/en/transform-v2/common-options.md index c45b4ba167..ce88ce8528 100644 --- a/docs/en/transform-v2/common-options.md +++ b/docs/en/transform-v2/common-options.md @@ -1,23 +1,65 @@ # Transform Common Options -> Common parameters of source connectors +> This is a process of intermediate conversion between the source and sink terminals,You can use sql statements to smoothly complete the conversion process -| name | type | required | default value | -|-------------------|--------|----------|---------------| -| result_table_name | string | no | - | -| source_table_name | string | no | - | +| Name | Type | Required | Default | Description [...] +|-------------------|--------|----------|---------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- [...] +| result_table_name | String | No | - | When `source_table_name` is not specified, the current plug-in processes the data set `(dataset)` output by the previous plug-in in the configuration file; <br/>When `source_table_name` is specified, the current plugin is processing the data set corresponding to this parameter. [...] +| source_table_name | String | No | - | When `result_table_name` is not specified, the data processed by this plugin will not be registered as a data set that can be directly accessed by other plugins, or called a temporary table `(table)`; <br/>When `result_table_name` is specified, the data processed by this plugin will be registered as a data set `(dataset)` that can be directly accessed by other plugins, or called a temporary table `(table)` . The dataset registered here [...] -### source_table_name [string] +## Task Example -When `source_table_name` is not specified, the current plug-in processes the data set `(dataset)` output by the previous plug-in in the configuration file; +### Simple: -When `source_table_name` is specified, the current plugin is processing the data set corresponding to this parameter. +> This is the process of converting the data source to fake and write it to two different sinks, Detailed reference `transform` -### result_table_name [string] +```bash +env { + job.mode = "BATCH" +} -When `result_table_name` is not specified, the data processed by this plugin will not be registered as a data set that can be directly accessed by other plugins, or called a temporary table `(table)`; +source { + FakeSource { + result_table_name = "fake" + row.num = 100 + schema = { + fields { + id = "int" + name = "string" + age = "int" + c_timestamp = "timestamp" + c_date = "date" + c_map = "map<string, string>" + c_array = "array<int>" + c_decimal = "decimal(30, 8)" + c_row = { + c_row = { + c_int = int + } + } + } + } + } +} -When `result_table_name` is specified, the data processed by this plugin will be registered as a data set `(dataset)` that can be directly accessed by other plugins, or called a temporary table `(table)` . The dataset registered here can be directly accessed by other plugins by specifying `source_table_name` . +transform { + Sql { + source_table_name = "fake" + result_table_name = "fake1" + # the query table name must same as field 'source_table_name' + query = "select id, regexp_replace(name, '.+', 'b') as name, age+1 as age, pi() as pi, c_timestamp, c_date, c_map, c_array, c_decimal, c_row from fake" + } + # The SQL transform support base function and criteria operation + # But the complex SQL unsupported yet, include: multi source table/rows JOIN and AGGREGATE operation and the like +} -## Examples +sink { + Console { + source_table_name = "fake1" + } + Console { + source_table_name = "fake" + } +} +```