This is an automated email from the ASF dual-hosted git repository. morningman pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push: new 4fc70441e55 [doc](routine load) update routine load doc about max_batch_interval and fix some error (#1236) 4fc70441e55 is described below commit 4fc70441e5538997d1cad581bdc70c061b56ca3e Author: hui lai <1353307...@qq.com> AuthorDate: Thu Oct 31 11:08:45 2024 +0800 [doc](routine load) update routine load doc about max_batch_interval and fix some error (#1236) # Versions - [x] dev - [x] 3.0 - [x] 2.1 - [x] 2.0 # Languages - [x] Chinese - [x] English --- docs/data-operate/import/import-way/routine-load-manual.md | 4 ++-- .../current/data-operate/import/import-way/routine-load-manual.md | 2 +- .../version-2.0/data-operate/import/routine-load-manual.md | 4 ++-- .../version-2.1/data-operate/import/import-way/routine-load-manual.md | 2 +- .../version-3.0/data-operate/import/import-way/routine-load-manual.md | 2 +- versioned_docs/version-2.0/data-operate/import/routine-load-manual.md | 4 ++-- .../version-2.1/data-operate/import/import-way/routine-load-manual.md | 4 ++-- .../version-3.0/data-operate/import/import-way/routine-load-manual.md | 4 ++-- 8 files changed, 13 insertions(+), 13 deletions(-) diff --git a/docs/data-operate/import/import-way/routine-load-manual.md b/docs/data-operate/import/import-way/routine-load-manual.md index 164299d69f4..e8b7d5612de 100644 --- a/docs/data-operate/import/import-way/routine-load-manual.md +++ b/docs/data-operate/import/import-way/routine-load-manual.md @@ -427,9 +427,9 @@ Here are the available parameters for the job_properties clause: | Parameter | Description | | --------------------------- | ------------------------------------------------------------ | | desired_concurrent_number | <ul><li>Default value: 256</li><li>Description: Specifies the desired concurrency for a single load subtask (load task). It modifies the expected number of load subtasks for a Routine Load job. The actual concurrency during the load process may not be equal to the desired concurrency. The actual concurrency is determined based on factors such as the number of nodes in the cluster, the load on the cluster, and the characteristics of the data source. The act [...] -| max_batch_interval | The maximum running time for each subtask, in seconds. The range is from 1s to 60s, with a default value of 10s. max_batch_interval/max_batch_rows/max_batch_size together form the execution threshold for subtasks. If any of these parameters reaches the threshold, the load subtask ends and a new one is generated. | +| max_batch_interval | The maximum running time for each subtask, in seconds. Must be greater than 0, with a default value of 60s. max_batch_interval/max_batch_rows/max_batch_size together form the execution threshold for subtasks. If any of these parameters reaches the threshold, the load subtask ends and a new one is generated. | | max_batch_rows | The maximum number of rows read by each subtask. Must be greater than or equal to 200,000. The default value is 20,000,000. max_batch_interval/max_batch_rows/max_batch_size together form the execution threshold for subtasks. If any of these parameters reaches the threshold, the load subtask ends and a new one is generated. | -| max_batch_size | The maximum number of bytes read by each subtask. The unit is bytes, and the range is from 100MB to 1GB. The default value is 1G. max_batch_interval/max_batch_rows/max_batch_size together form the execution threshold for subtasks. If any of these parameters reaches the threshold, the load subtask ends and a new one is generated. | +| max_batch_size | The maximum number of bytes read by each subtask. The unit is bytes, and the range is from 100MB to 10GB. The default value is 1G. max_batch_interval/max_batch_rows/max_batch_size together form the execution threshold for subtasks. If any of these parameters reaches the threshold, the load subtask ends and a new one is generated. | | max_error_number | The maximum number of error rows allowed within a sampling window. Must be greater than or equal to 0. The default value is 0, which means no error rows are allowed. The sampling window is `max_batch_rows * 10`. If the number of error rows within the sampling window exceeds `max_error_number`, the regular job will be paused and manual intervention is required to check for data quality issues using the [SHOW ROUTINE LOAD](../../../sql-manual/sql-statements/ [...] | strict_mode | Whether to enable strict mode. The default value is disabled. Strict mode applies strict filtering to type conversions during the load process. If enabled, non-null original data that results in a NULL after type conversion will be filtered out. The filtering rules in strict mode are as follows:<ul><li>Derived columns (generated by functions) are not affected by strict mode.</li><li>If a column's type needs to be converted, any data with an incorrect data [...] | timezone | Specifies the time zone used by the load job. The default is to use the session's timezone parameter. This parameter affects the results of all timezone-related functions involved in the load. | diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/import-way/routine-load-manual.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/import-way/routine-load-manual.md index 337021afc60..b46036eae76 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/import-way/routine-load-manual.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/import-way/routine-load-manual.md @@ -439,7 +439,7 @@ job_properties 子句具体参数选项如下: | 参数 | 说明 | | ------------------------- | ------------------------------------------------------------ | | desired_concurrent_number | <p>默认值:256 </p> <p>参数描述:单个导入子任务(load task)期望的并发度,修改 Routine Load 导入作业切分的期望导入子任务数量。在导入过程中,期望的子任务并发度可能不等于实际并发度。实际的并发度会根据集群的节点数、负载情况,以及数据源的情况综合考虑,使用公式以下可以计算出实际的导入子任务数:</p> <p>` min(topic_partition_num, desired_concurrent_number, max_routine_load_task_concurrent_num)`,其中:</p> <p>- topic_partition_num 表示 Kafka Topic 的 parititon 数量</p> <p>- desired_concurrent_number 表示设置的参数大小</p> <p>- max_routine_load_task_concurrent_num 为 FE 中设置 Routine Load 最大任务并行度的参数</p> | -| max_batch_interval | 每个子任务的最大运行时间,单位是秒,范围为 1s 到 60s,默认值为 10(s)。max_batch_interval/max_batch_rows/max_batch_size 共同形成子任务执行阈值。任一参数达到阈值,导入子任务结束,并生成新的导入子任务。 | +| max_batch_interval | 每个子任务的最大运行时间,单位是秒,必须大于0,默认值为 60(s)。max_batch_interval/max_batch_rows/max_batch_size 共同形成子任务执行阈值。任一参数达到阈值,导入子任务结束,并生成新的导入子任务。 | | max_batch_rows | 每个子任务最多读取的行数。必须大于等于 200000。默认是 20000000。max_batch_interval/max_batch_rows/max_batch_size 共同形成子任务执行阈值。任一参数达到阈值,导入子任务结束,并生成新的导入子任务。 | | max_batch_size | 每个子任务最多读取的字节数。单位是字节,范围是 100MB 到 1GB。默认是 1G。max_batch_interval/max_batch_rows/max_batch_size 共同形成子任务执行阈值。任一参数达到阈值,导入子任务结束,并生成新的导入子任务。 | | max_error_number | 采样窗口内,允许的最大错误行数。必须大于等于 0。默认是 0,即不允许有错误行。采样窗口为 `max_batch_rows * 10`。即如果在采样窗口内,错误行数大于 `max_error_number`,则会导致例行作业被暂停,需要人工介入检查数据质量问题,通过 [SHOW ROUTINE LOAD](../../../sql-manual/sql-statements/Show-Statements/SHOW-ROUTINE-LOAD) 命令中 `ErrorLogUrls` 检查数据的质量问题。被 where 条件过滤掉的行不算错误行。 | diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/data-operate/import/routine-load-manual.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/data-operate/import/routine-load-manual.md index 29e7b13ab7e..1cd0a052d8d 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/data-operate/import/routine-load-manual.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/data-operate/import/routine-load-manual.md @@ -439,9 +439,9 @@ job_properties 子句具体参数选项如下: | 参数 | 说明 | | ------------------------- | ------------------------------------------------------------ | | desired_concurrent_number | 默认值:256 <br />参数描述:单个导入子任务(load task)期望的并发度,修改 Routine Load 导入作业切分的期望导入子任务数量。在导入过程中,期望的子任务并发度可能不等于实际并发度。实际的并发度会根据集群的节点数、负载情况,以及数据源的情况综合考虑,使用公式以下可以计算出实际的导入子任务数:<br />` min(topic_partition_num, desired_concurrent_number, max_routine_load_task_concurrent_num)`,其中:<br />- topic_partition_num 表示 Kafka Topic 的 parititon 数量<br />- desired_concurrent_number 表示设置的参数大小 <br />- max_routine_load_task_concurrent_num 为 FE 中设置 Routine Load 最大任务并行度的参数 | -| max_batch_interval | 每个子任务的最大运行时间,单位是秒,范围为 1s 到 60s,默认值为 10(s)。max_batch_interval/max_batch_rows/max_batch_size 共同形成子任务执行阈值。任一参数达到阈值,导入子任务结束,并生成新的导入子任务。 | +| max_batch_interval | 每个子任务的最大运行时间,单位是秒,必须大于0,默认值为 60(s)。max_batch_interval/max_batch_rows/max_batch_size 共同形成子任务执行阈值。任一参数达到阈值,导入子任务结束,并生成新的导入子任务。 | | max_batch_rows | 每个子任务最多读取的行数。必须大于等于 200000。默认是 200000(2.0.13 及更高版本为 20000000)。max_batch_interval/max_batch_rows/max_batch_size 共同形成子任务执行阈值。任一参数达到阈值,导入子任务结束,并生成新的导入子任务。 | -| max_batch_size | 每个子任务最多读取的字节数。单位是字节,范围是 100MB 到 1GB。默认是 100MB(2.0.13 及更高版本为 1G)。max_batch_interval/max_batch_rows/max_batch_size 共同形成子任务执行阈值。任一参数达到阈值,导入子任务结束,并生成新的导入子任务。 | +| max_batch_size | 每个子任务最多读取的字节数。单位是字节,范围是 100MB 到 10GB。默认是 100MB(2.0.13 及更高版本为 1G)。max_batch_interval/max_batch_rows/max_batch_size 共同形成子任务执行阈值。任一参数达到阈值,导入子任务结束,并生成新的导入子任务。 | | max_error_number | 采样窗口内,允许的最大错误行数。必须大于等于 0。默认是 0,即不允许有错误行。采样窗口为 `max_batch_rows * 10`。即如果在采样窗口内,错误行数大于 `max_error_number`,则会导致例行作业被暂停,需要人工介入检查数据质量问题,通过 [SHOW ROUTINE LOAD](../../sql-manual/sql-reference/Show-Statements/SHOW-ROUTINE-LOAD) 命令中 `ErrorLogUrls` 检查数据的质量问题。被 where 条件过滤掉的行不算错误行。 | | strict_mode | 是否开启严格模式,默认为关闭。严格模式表示对于导入过程中的列类型转换进行严格过滤。如果开启后,非空原始数据的列类型变换如果结果为 NULL,则会被过滤。<br />严格模式过滤策略如下:<br />- 某衍生列(由函数转换生成而来),Strict Mode 对其不产生影响 <br />- 当列类型需要转换,错误的数据类型将被过滤掉,在 [SHOW ROUTINE LOAD](../../sql-manual/sql-reference/Show-Statements/SHOW-ROUTINE-LOAD) 的 `ErrorLogUrls` 中查看因为数据类型错误而被过滤掉的列 <br />- 对于导入的某列类型包含范围限制的,如果原始数据能正常通过类型转换,但无法通过范围限制的,strict mode 对其也不产生影响。例如:如果类型是 decimal(1,0), 原始数据为 10,则属于可以通过类型转换但不在列声明的范围内。这种数据 strict 对其不产生影响。详细内容参考[严格模式]( ../../../da [...] | timezone | 指定导入作业所使用的时区。默认为使用 Session 的 timezone 参数。该参数会影响所有导入涉及的和时区有关的函数结果。 | diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/data-operate/import/import-way/routine-load-manual.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/data-operate/import/import-way/routine-load-manual.md index 64e135b4113..0b3e6f34f5a 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/data-operate/import/import-way/routine-load-manual.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/data-operate/import/import-way/routine-load-manual.md @@ -439,7 +439,7 @@ job_properties 子句具体参数选项如下: | 参数 | 说明 | | ------------------------- | ------------------------------------------------------------ | | desired_concurrent_number | <p>默认值:256 </p> <p>参数描述:单个导入子任务(load task)期望的并发度,修改 Routine Load 导入作业切分的期望导入子任务数量。在导入过程中,期望的子任务并发度可能不等于实际并发度。实际的并发度会根据集群的节点数、负载情况,以及数据源的情况综合考虑,使用公式以下可以计算出实际的导入子任务数:</p> <p>` min(topic_partition_num, desired_concurrent_number, max_routine_load_task_concurrent_num)`,其中:</p> <p>- topic_partition_num 表示 Kafka Topic 的 parititon 数量</p> <p>- desired_concurrent_number 表示设置的参数大小</p> <p>- max_routine_load_task_concurrent_num 为 FE 中设置 Routine Load 最大任务并行度的参数</p> | -| max_batch_interval | 每个子任务的最大运行时间,单位是秒,范围为 1s 到 60s,默认值为 10(s)。max_batch_interval/max_batch_rows/max_batch_size 共同形成子任务执行阈值。任一参数达到阈值,导入子任务结束,并生成新的导入子任务。 | +| max_batch_interval | 每个子任务的最大运行时间,单位是秒,必须大于0,默认值为 60(s)。max_batch_interval/max_batch_rows/max_batch_size 共同形成子任务执行阈值。任一参数达到阈值,导入子任务结束,并生成新的导入子任务。 | | max_batch_rows | 每个子任务最多读取的行数。必须大于等于 200000。默认是 200000(2.1.5 及更高版本为 20000000)。max_batch_interval/max_batch_rows/max_batch_size 共同形成子任务执行阈值。任一参数达到阈值,导入子任务结束,并生成新的导入子任务。 | | max_batch_size | 每个子任务最多读取的字节数。单位是字节,范围是 100MB 到 1GB。默认是 100MB(2.1.5 及更高版本为 1G)。max_batch_interval/max_batch_rows/max_batch_size 共同形成子任务执行阈值。任一参数达到阈值,导入子任务结束,并生成新的导入子任务。 | | max_error_number | 采样窗口内,允许的最大错误行数。必须大于等于 0。默认是 0,即不允许有错误行。采样窗口为 `max_batch_rows * 10`。即如果在采样窗口内,错误行数大于 `max_error_number`,则会导致例行作业被暂停,需要人工介入检查数据质量问题,通过 [SHOW ROUTINE LOAD](../../../sql-manual/sql-statements/Show-Statements/SHOW-ROUTINE-LOAD) 命令中 `ErrorLogUrls` 检查数据的质量问题。被 where 条件过滤掉的行不算错误行。 | diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/data-operate/import/import-way/routine-load-manual.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/data-operate/import/import-way/routine-load-manual.md index 65925d27fa8..955eac82504 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/data-operate/import/import-way/routine-load-manual.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/data-operate/import/import-way/routine-load-manual.md @@ -439,7 +439,7 @@ job_properties 子句具体参数选项如下: | 参数 | 说明 | | ------------------------- | ------------------------------------------------------------ | | desired_concurrent_number | <p>默认值:5 </p> <p>参数描述:单个导入子任务(load task)期望的并发度,修改 Routine Load 导入作业切分的期望导入子任务数量。在导入过程中,期望的子任务并发度可能不等于实际并发度。实际的并发度会根据集群的节点数、负载情况,以及数据源的情况综合考虑,使用公式以下可以计算出实际的导入子任务数:</p> <p>` min(topic_partition_num, desired_concurrent_number, max_routine_load_task_concurrent_num)`,其中:</p> <p>- topic_partition_num 表示 Kafka Topic 的 parititon 数量</p> <p>- desired_concurrent_number 表示设置的参数大小</p> <p>- max_routine_load_task_concurrent_num 为 FE 中设置 Routine Load 最大任务并行度的参数</p> | -| max_batch_interval | 每个子任务的最大运行时间,单位是秒,范围为 1s 到 60s,默认值为 10(s)。max_batch_interval/max_batch_rows/max_batch_size 共同形成子任务执行阈值。任一参数达到阈值,导入子任务结束,并生成新的导入子任务。 | +| max_batch_interval | 每个子任务的最大运行时间,单位是秒,必须大于0,默认值为 60(s)。max_batch_interval/max_batch_rows/max_batch_size 共同形成子任务执行阈值。任一参数达到阈值,导入子任务结束,并生成新的导入子任务。 | | max_batch_rows | 每个子任务最多读取的行数。必须大于等于 200000。默认是 20000000。max_batch_interval/max_batch_rows/max_batch_size 共同形成子任务执行阈值。任一参数达到阈值,导入子任务结束,并生成新的导入子任务。 | | max_batch_size | 每个子任务最多读取的字节数。单位是字节,范围是 100MB 到 1GB。默认是 1G。max_batch_interval/max_batch_rows/max_batch_size 共同形成子任务执行阈值。任一参数达到阈值,导入子任务结束,并生成新的导入子任务。 | | max_error_number | 采样窗口内,允许的最大错误行数。必须大于等于 0。默认是 0,即不允许有错误行。采样窗口为 `max_batch_rows * 10`。即如果在采样窗口内,错误行数大于 `max_error_number`,则会导致例行作业被暂停,需要人工介入检查数据质量问题,通过 [SHOW ROUTINE LOAD](../../../sql-manual/sql-statements/Show-Statements/SHOW-ROUTINE-LOAD) 命令中 `ErrorLogUrls` 检查数据的质量问题。被 where 条件过滤掉的行不算错误行。 | diff --git a/versioned_docs/version-2.0/data-operate/import/routine-load-manual.md b/versioned_docs/version-2.0/data-operate/import/routine-load-manual.md index 3bd9c250676..abeb5d78ce3 100644 --- a/versioned_docs/version-2.0/data-operate/import/routine-load-manual.md +++ b/versioned_docs/version-2.0/data-operate/import/routine-load-manual.md @@ -427,9 +427,9 @@ Here are the available parameters for the job_properties clause: | Parameter | Description | | --------------------------- | ------------------------------------------------------------ | | desired_concurrent_number | <ul><li>Default value: 256</li><li>Description: Specifies the desired concurrency for a single load subtask (load task). It modifies the expected number of load subtasks for a Routine Load job. The actual concurrency during the load process may not be equal to the desired concurrency. The actual concurrency is determined based on factors such as the number of nodes in the cluster, the load on the cluster, and the characteristics of the data source. The act [...] -| max_batch_interval | The maximum running time for each subtask, in seconds. The range is from 1s to 60s, with a default value of 10s. max_batch_interval/max_batch_rows/max_batch_size together form the execution threshold for subtasks. If any of these parameters reaches the threshold, the load subtask ends and a new one is generated. | +| max_batch_interval | The maximum running time for each subtask, in seconds. Must be greater than 0, with a default value of 10s. max_batch_interval/max_batch_rows/max_batch_size together form the execution threshold for subtasks. If any of these parameters reaches the threshold, the load subtask ends and a new one is generated. | | max_batch_rows | The maximum number of rows read by each subtask. Must be greater than or equal to 200,000. The default value is 200,000(The default value for versions 2.0.13 and higher is 20,000,000). max_batch_interval/max_batch_rows/max_batch_size together form the execution threshold for subtasks. If any of these parameters reaches the threshold, the load subtask ends and a new one is generated. | -| max_batch_size | The maximum number of bytes read by each subtask. The unit is bytes, and the range is from 100MB to 1GB. The default value is 100MB(The default value for versions 2.0.13 and higher is 1G). max_batch_interval/max_batch_rows/max_batch_size together form the execution threshold for subtasks. If any of these parameters reaches the threshold, the load subtask ends and a new one is generated. | +| max_batch_size | The maximum number of bytes read by each subtask. The unit is bytes, and the range is from 100MB to 10GB. The default value is 100MB(The default value for versions 2.0.13 and higher is 1G). max_batch_interval/max_batch_rows/max_batch_size together form the execution threshold for subtasks. If any of these parameters reaches the threshold, the load subtask ends and a new one is generated. | | max_error_number | The maximum number of error rows allowed within a sampling window. Must be greater than or equal to 0. The default value is 0, which means no error rows are allowed. The sampling window is `max_batch_rows * 10`. If the number of error rows within the sampling window exceeds `max_error_number`, the regular job will be paused and manual intervention is required to check for data quality issues using the [SHOW ROUTINE LOAD](../../sql-manual/sql-reference/Show [...] | strict_mode | Whether to enable strict mode. The default value is disabled. Strict mode applies strict filtering to type conversions during the load process. If enabled, non-null original data that results in a NULL after type conversion will be filtered out. The filtering rules in strict mode are as follows:<ul><li>Derived columns (generated by functions) are not affected by strict mode.</li><li>If a column's type needs to be converted, any data with an incorrect data [...] | timezone | Specifies the time zone used by the load job. The default is to use the session's timezone parameter. This parameter affects the results of all timezone-related functions involved in the load. | diff --git a/versioned_docs/version-2.1/data-operate/import/import-way/routine-load-manual.md b/versioned_docs/version-2.1/data-operate/import/import-way/routine-load-manual.md index 0b6351fc2b5..12e981ed507 100644 --- a/versioned_docs/version-2.1/data-operate/import/import-way/routine-load-manual.md +++ b/versioned_docs/version-2.1/data-operate/import/import-way/routine-load-manual.md @@ -427,9 +427,9 @@ Here are the available parameters for the job_properties clause: | Parameter | Description | | --------------------------- | ------------------------------------------------------------ | | desired_concurrent_number | <ul><li>Default value: 256</li><li>Description: Specifies the desired concurrency for a single load subtask (load task). It modifies the expected number of load subtasks for a Routine Load job. The actual concurrency during the load process may not be equal to the desired concurrency. The actual concurrency is determined based on factors such as the number of nodes in the cluster, the load on the cluster, and the characteristics of the data source. The act [...] -| max_batch_interval | The maximum running time for each subtask, in seconds. The range is from 1s to 60s, with a default value of 10s. max_batch_interval/max_batch_rows/max_batch_size together form the execution threshold for subtasks. If any of these parameters reaches the threshold, the load subtask ends and a new one is generated. | +| max_batch_interval | The maximum running time for each subtask, in seconds. Must be greater than 0, with a default value of 60s. max_batch_interval/max_batch_rows/max_batch_size together form the execution threshold for subtasks. If any of these parameters reaches the threshold, the load subtask ends and a new one is generated. | | max_batch_rows | The maximum number of rows read by each subtask. Must be greater than or equal to 200,000. The default value is 200,000(The default value for versions 2.1.5 and higher is 20,000,000.). max_batch_interval/max_batch_rows/max_batch_size together form the execution threshold for subtasks. If any of these parameters reaches the threshold, the load subtask ends and a new one is generated. | -| max_batch_size | The maximum number of bytes read by each subtask. The unit is bytes, and the range is from 100MB to 1GB. The default value is 100MB(The default value for versions 2.1.5 and higher is 1G). max_batch_interval/max_batch_rows/max_batch_size together form the execution threshold for subtasks. If any of these parameters reaches the threshold, the load subtask ends and a new one is generated. | +| max_batch_size | The maximum number of bytes read by each subtask. The unit is bytes, and and the range is from 100MB to 10GB. The default value is 100MB(The default value for versions 2.1.5 and higher is 1G). max_batch_interval/max_batch_rows/max_batch_size together form the execution threshold for subtasks. If any of these parameters reaches the threshold, the load subtask ends and a new one is generated. | | max_error_number | The maximum number of error rows allowed within a sampling window. Must be greater than or equal to 0. The default value is 0, which means no error rows are allowed. The sampling window is `max_batch_rows * 10`. If the number of error rows within the sampling window exceeds `max_error_number`, the regular job will be paused and manual intervention is required to check for data quality issues using the [SHOW ROUTINE LOAD](../../../sql-manual/sql-statements/ [...] | strict_mode | Whether to enable strict mode. The default value is disabled. Strict mode applies strict filtering to type conversions during the load process. If enabled, non-null original data that results in a NULL after type conversion will be filtered out. The filtering rules in strict mode are as follows:<ul><li>Derived columns (generated by functions) are not affected by strict mode.</li><li>If a column's type needs to be converted, any data with an incorrect data [...] | timezone | Specifies the time zone used by the load job. The default is to use the session's timezone parameter. This parameter affects the results of all timezone-related functions involved in the load. | diff --git a/versioned_docs/version-3.0/data-operate/import/import-way/routine-load-manual.md b/versioned_docs/version-3.0/data-operate/import/import-way/routine-load-manual.md index 164299d69f4..14e85f3a7fe 100644 --- a/versioned_docs/version-3.0/data-operate/import/import-way/routine-load-manual.md +++ b/versioned_docs/version-3.0/data-operate/import/import-way/routine-load-manual.md @@ -427,9 +427,9 @@ Here are the available parameters for the job_properties clause: | Parameter | Description | | --------------------------- | ------------------------------------------------------------ | | desired_concurrent_number | <ul><li>Default value: 256</li><li>Description: Specifies the desired concurrency for a single load subtask (load task). It modifies the expected number of load subtasks for a Routine Load job. The actual concurrency during the load process may not be equal to the desired concurrency. The actual concurrency is determined based on factors such as the number of nodes in the cluster, the load on the cluster, and the characteristics of the data source. The act [...] -| max_batch_interval | The maximum running time for each subtask, in seconds. The range is from 1s to 60s, with a default value of 10s. max_batch_interval/max_batch_rows/max_batch_size together form the execution threshold for subtasks. If any of these parameters reaches the threshold, the load subtask ends and a new one is generated. | +| max_batch_interval | The maximum running time for each subtask, in seconds. Must be greater than 0, with a default value of 60s. max_batch_interval/max_batch_rows/max_batch_size together form the execution threshold for subtasks. If any of these parameters reaches the threshold, the load subtask ends and a new one is generated. | | max_batch_rows | The maximum number of rows read by each subtask. Must be greater than or equal to 200,000. The default value is 20,000,000. max_batch_interval/max_batch_rows/max_batch_size together form the execution threshold for subtasks. If any of these parameters reaches the threshold, the load subtask ends and a new one is generated. | -| max_batch_size | The maximum number of bytes read by each subtask. The unit is bytes, and the range is from 100MB to 1GB. The default value is 1G. max_batch_interval/max_batch_rows/max_batch_size together form the execution threshold for subtasks. If any of these parameters reaches the threshold, the load subtask ends and a new one is generated. | +| max_batch_size | The maximum number of bytes read by each subtask. The unit is bytes, and and the range is from 100MB to 10GB. The default value is 1G. max_batch_interval/max_batch_rows/max_batch_size together form the execution threshold for subtasks. If any of these parameters reaches the threshold, the load subtask ends and a new one is generated. | | max_error_number | The maximum number of error rows allowed within a sampling window. Must be greater than or equal to 0. The default value is 0, which means no error rows are allowed. The sampling window is `max_batch_rows * 10`. If the number of error rows within the sampling window exceeds `max_error_number`, the regular job will be paused and manual intervention is required to check for data quality issues using the [SHOW ROUTINE LOAD](../../../sql-manual/sql-statements/ [...] | strict_mode | Whether to enable strict mode. The default value is disabled. Strict mode applies strict filtering to type conversions during the load process. If enabled, non-null original data that results in a NULL after type conversion will be filtered out. The filtering rules in strict mode are as follows:<ul><li>Derived columns (generated by functions) are not affected by strict mode.</li><li>If a column's type needs to be converted, any data with an incorrect data [...] | timezone | Specifies the time zone used by the load job. The default is to use the session's timezone parameter. This parameter affects the results of all timezone-related functions involved in the load. | --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org