This is an automated email from the ASF dual-hosted git repository.

Mryange pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git


The following commit(s) were added to refs/heads/master by this push:
     new 5fef3c1da22 [doc] add function IS_VALID_UTF8 doc (#3551)
5fef3c1da22 is described below

commit 5fef3c1da22841a3a99dd6fcd58c7475c3c92a23
Author: Mryange <[email protected]>
AuthorDate: Wed Jun 24 16:32:27 2026 +0800

    [doc] add function IS_VALID_UTF8 doc (#3551)
    
    ## Versions
    
    - [x] dev
    - [x] 4.x
    - [ ] 3.x
    - [ ] 2.1
    
    ## Languages
    
    - [ ] Chinese
    - [ ] English
    
    ## Docs Checklist
    
    - [ ] Checked by AI
    - [ ] Test Cases Built
---
 .../string-functions/is-valid-utf8.md              | 146 +++++++++++++++++++++
 .../string-functions/is-valid-utf8.md              | 146 +++++++++++++++++++++
 .../string-functions/is-valid-utf8.md              | 146 +++++++++++++++++++++
 sidebars.ts                                        |   1 +
 .../string-functions/is-valid-utf8.md              | 146 +++++++++++++++++++++
 versioned_sidebars/version-4.x-sidebars.json       |   1 +
 6 files changed, 586 insertions(+)

diff --git 
a/docs/sql-manual/sql-functions/scalar-functions/string-functions/is-valid-utf8.md
 
b/docs/sql-manual/sql-functions/scalar-functions/string-functions/is-valid-utf8.md
new file mode 100644
index 00000000000..b4c099ddfd0
--- /dev/null
+++ 
b/docs/sql-manual/sql-functions/scalar-functions/string-functions/is-valid-utf8.md
@@ -0,0 +1,146 @@
+---
+{
+    "title": "IS_VALID_UTF8",
+    "language": "en",
+    "description": "The IS_VALID_UTF8 function checks whether a string is 
valid UTF-8 encoded data. Returns true if the string is valid UTF-8, false 
otherwise."
+}
+---
+
+## Description
+
+The IS_VALID_UTF8 function checks whether a string is valid UTF-8 encoded 
data. It validates every byte sequence in the input and returns `true` if all 
sequences conform to the UTF-8 encoding standard, or `false` if any invalid 
byte sequence is found.
+
+This is useful when dealing with data imported from external sources (files, 
network streams, etc.) that may contain binary or incorrectly encoded content, 
and you need to verify data integrity before performing string operations.
+
+## Alias
+
+- `ISVALIDUTF8()`
+
+## Syntax
+
+```sql
+IS_VALID_UTF8(<str>)
+```
+
+## Parameters
+
+| Parameter | Description |
+|-----------|-------------|
+| `<str>` | The string to validate. Type: VARCHAR or STRING |
+
+## Return Value
+
+Returns BOOLEAN type.
+
+- Returns `true` if the string is valid UTF-8 encoded data.
+- Returns `false` if the string contains any invalid UTF-8 byte sequence.
+
+Special cases:
+- If the parameter is NULL, returns NULL.
+- An empty string is considered valid UTF-8, returns `true`.
+
+## Examples
+
+1. Valid ASCII strings
+
+```sql
+SELECT IS_VALID_UTF8('hello');
+```
+
+```text
++------------------------+
+| is_valid_utf8('hello') |
++------------------------+
+|                      1 |
++------------------------+
+```
+
+2. Valid multi-byte UTF-8 characters (Chinese)
+
+```sql
+SELECT IS_VALID_UTF8('Hello, 世界');
+```
+
+```text
++-----------------------------+
+| is_valid_utf8('Hello, 世界') |
++-----------------------------+
+|                           1 |
++-----------------------------+
+```
+
+3. Empty string
+
+```sql
+SELECT IS_VALID_UTF8('');
+```
+
+```text
++--------------------+
+| is_valid_utf8('')  |
++--------------------+
+|                  1 |
++--------------------+
+```
+
+4. Invalid UTF-8 bytes (constructed via UNHEX)
+
+```sql
+SELECT IS_VALID_UTF8(UNHEX('C0AF'));
+```
+
+```text
++------------------------------+
+| is_valid_utf8(unhex('C0AF')) |
++------------------------------+
+|                            0 |
++------------------------------+
+```
+
+5. NULL value handling
+
+```sql
+SELECT IS_VALID_UTF8(NULL);
+```
+
+```text
++---------------------+
+| is_valid_utf8(NULL) |
++---------------------+
+|                NULL |
++---------------------+
+```
+
+6. Using with table data
+
+```sql
+CREATE TABLE test_utf8 (
+    id INT,
+    val VARCHAR(200)
+) DISTRIBUTED BY HASH(id) BUCKETS 1
+PROPERTIES ("replication_num" = "1");
+
+INSERT INTO test_utf8 VALUES
+(1, 'hello'),
+(2, ''),
+(3, 'Hello, 世界'),
+(4, NULL);
+
+INSERT INTO test_utf8 VALUES (5, UNHEX('C0AF'));
+INSERT INTO test_utf8 VALUES (6, UNHEX('FF'));
+
+SELECT id, IS_VALID_UTF8(val) FROM test_utf8 ORDER BY id;
+```
+
+```text
++------+--------------------+
+| id   | is_valid_utf8(val) |
++------+--------------------+
+|    1 |                  1 |
+|    2 |                  1 |
+|    3 |                  1 |
+|    4 |               NULL |
+|    5 |                  0 |
+|    6 |                  0 |
++------+--------------------+
+```
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/scalar-functions/string-functions/is-valid-utf8.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/scalar-functions/string-functions/is-valid-utf8.md
new file mode 100644
index 00000000000..e6b4c3f513f
--- /dev/null
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/scalar-functions/string-functions/is-valid-utf8.md
@@ -0,0 +1,146 @@
+---
+{
+    "title": "IS_VALID_UTF8",
+    "language": "zh-CN",
+    "description": "IS_VALID_UTF8 函数用于检查字符串是否为合法的 UTF-8 编码数据。如果字符串是合法 UTF-8 
则返回 true,否则返回 false。"
+}
+---
+
+## 描述
+
+IS_VALID_UTF8 函数用于检查字符串是否为合法的 UTF-8 编码数据。它会验证输入中的每个字节序列,如果所有序列都符合 UTF-8 
编码标准则返回 `true`,如果发现任何非法字节序列则返回 `false`。
+
+该函数在处理从外部数据源(文件、网络流等)导入的数据时非常有用,这些数据可能包含二进制或编码错误的内容,您可以在执行字符串操作之前验证数据的完整性。
+
+## 别名
+
+- `ISVALIDUTF8()`
+
+## 语法
+
+```sql
+IS_VALID_UTF8(<str>)
+```
+
+## 参数
+
+| 参数 | 说明 |
+|------|------|
+| `<str>` | 需要验证的字符串。类型:VARCHAR 或 STRING |
+
+## 返回值
+
+返回 BOOLEAN 类型。
+
+- 如果字符串是合法的 UTF-8 编码数据,返回 `true`。
+- 如果字符串包含任何非法的 UTF-8 字节序列,返回 `false`。
+
+特殊情况:
+- 如果参数为 NULL,返回 NULL。
+- 空字符串被视为合法的 UTF-8,返回 `true`。
+
+## 示例
+
+1. 合法的 ASCII 字符串
+
+```sql
+SELECT IS_VALID_UTF8('hello');
+```
+
+```text
++------------------------+
+| is_valid_utf8('hello') |
++------------------------+
+|                      1 |
++------------------------+
+```
+
+2. 合法的多字节 UTF-8 字符(中文)
+
+```sql
+SELECT IS_VALID_UTF8('Hello, 世界');
+```
+
+```text
++-----------------------------+
+| is_valid_utf8('Hello, 世界') |
++-----------------------------+
+|                           1 |
++-----------------------------+
+```
+
+3. 空字符串
+
+```sql
+SELECT IS_VALID_UTF8('');
+```
+
+```text
++--------------------+
+| is_valid_utf8('')  |
++--------------------+
+|                  1 |
++--------------------+
+```
+
+4. 非法的 UTF-8 字节(通过 UNHEX 构造)
+
+```sql
+SELECT IS_VALID_UTF8(UNHEX('C0AF'));
+```
+
+```text
++------------------------------+
+| is_valid_utf8(unhex('C0AF')) |
++------------------------------+
+|                            0 |
++------------------------------+
+```
+
+5. NULL 值处理
+
+```sql
+SELECT IS_VALID_UTF8(NULL);
+```
+
+```text
++---------------------+
+| is_valid_utf8(NULL) |
++---------------------+
+|                NULL |
++---------------------+
+```
+
+6. 配合表数据使用
+
+```sql
+CREATE TABLE test_utf8 (
+    id INT,
+    val VARCHAR(200)
+) DISTRIBUTED BY HASH(id) BUCKETS 1
+PROPERTIES ("replication_num" = "1");
+
+INSERT INTO test_utf8 VALUES
+(1, 'hello'),
+(2, ''),
+(3, 'Hello, 世界'),
+(4, NULL);
+
+INSERT INTO test_utf8 VALUES (5, UNHEX('C0AF'));
+INSERT INTO test_utf8 VALUES (6, UNHEX('FF'));
+
+SELECT id, IS_VALID_UTF8(val) FROM test_utf8 ORDER BY id;
+```
+
+```text
++------+--------------------+
+| id   | is_valid_utf8(val) |
++------+--------------------+
+|    1 |                  1 |
+|    2 |                  1 |
+|    3 |                  1 |
+|    4 |               NULL |
+|    5 |                  0 |
+|    6 |                  0 |
++------+--------------------+
+```
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/sql-manual/sql-functions/scalar-functions/string-functions/is-valid-utf8.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/sql-manual/sql-functions/scalar-functions/string-functions/is-valid-utf8.md
new file mode 100644
index 00000000000..e6b4c3f513f
--- /dev/null
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/sql-manual/sql-functions/scalar-functions/string-functions/is-valid-utf8.md
@@ -0,0 +1,146 @@
+---
+{
+    "title": "IS_VALID_UTF8",
+    "language": "zh-CN",
+    "description": "IS_VALID_UTF8 函数用于检查字符串是否为合法的 UTF-8 编码数据。如果字符串是合法 UTF-8 
则返回 true,否则返回 false。"
+}
+---
+
+## 描述
+
+IS_VALID_UTF8 函数用于检查字符串是否为合法的 UTF-8 编码数据。它会验证输入中的每个字节序列,如果所有序列都符合 UTF-8 
编码标准则返回 `true`,如果发现任何非法字节序列则返回 `false`。
+
+该函数在处理从外部数据源(文件、网络流等)导入的数据时非常有用,这些数据可能包含二进制或编码错误的内容,您可以在执行字符串操作之前验证数据的完整性。
+
+## 别名
+
+- `ISVALIDUTF8()`
+
+## 语法
+
+```sql
+IS_VALID_UTF8(<str>)
+```
+
+## 参数
+
+| 参数 | 说明 |
+|------|------|
+| `<str>` | 需要验证的字符串。类型:VARCHAR 或 STRING |
+
+## 返回值
+
+返回 BOOLEAN 类型。
+
+- 如果字符串是合法的 UTF-8 编码数据,返回 `true`。
+- 如果字符串包含任何非法的 UTF-8 字节序列,返回 `false`。
+
+特殊情况:
+- 如果参数为 NULL,返回 NULL。
+- 空字符串被视为合法的 UTF-8,返回 `true`。
+
+## 示例
+
+1. 合法的 ASCII 字符串
+
+```sql
+SELECT IS_VALID_UTF8('hello');
+```
+
+```text
++------------------------+
+| is_valid_utf8('hello') |
++------------------------+
+|                      1 |
++------------------------+
+```
+
+2. 合法的多字节 UTF-8 字符(中文)
+
+```sql
+SELECT IS_VALID_UTF8('Hello, 世界');
+```
+
+```text
++-----------------------------+
+| is_valid_utf8('Hello, 世界') |
++-----------------------------+
+|                           1 |
++-----------------------------+
+```
+
+3. 空字符串
+
+```sql
+SELECT IS_VALID_UTF8('');
+```
+
+```text
++--------------------+
+| is_valid_utf8('')  |
++--------------------+
+|                  1 |
++--------------------+
+```
+
+4. 非法的 UTF-8 字节(通过 UNHEX 构造)
+
+```sql
+SELECT IS_VALID_UTF8(UNHEX('C0AF'));
+```
+
+```text
++------------------------------+
+| is_valid_utf8(unhex('C0AF')) |
++------------------------------+
+|                            0 |
++------------------------------+
+```
+
+5. NULL 值处理
+
+```sql
+SELECT IS_VALID_UTF8(NULL);
+```
+
+```text
++---------------------+
+| is_valid_utf8(NULL) |
++---------------------+
+|                NULL |
++---------------------+
+```
+
+6. 配合表数据使用
+
+```sql
+CREATE TABLE test_utf8 (
+    id INT,
+    val VARCHAR(200)
+) DISTRIBUTED BY HASH(id) BUCKETS 1
+PROPERTIES ("replication_num" = "1");
+
+INSERT INTO test_utf8 VALUES
+(1, 'hello'),
+(2, ''),
+(3, 'Hello, 世界'),
+(4, NULL);
+
+INSERT INTO test_utf8 VALUES (5, UNHEX('C0AF'));
+INSERT INTO test_utf8 VALUES (6, UNHEX('FF'));
+
+SELECT id, IS_VALID_UTF8(val) FROM test_utf8 ORDER BY id;
+```
+
+```text
++------+--------------------+
+| id   | is_valid_utf8(val) |
++------+--------------------+
+|    1 |                  1 |
+|    2 |                  1 |
+|    3 |                  1 |
+|    4 |               NULL |
+|    5 |                  0 |
+|    6 |                  0 |
++------+--------------------+
+```
diff --git a/sidebars.ts b/sidebars.ts
index 3edea9b6593..bff82e0f700 100644
--- a/sidebars.ts
+++ b/sidebars.ts
@@ -1485,6 +1485,7 @@ const sidebars: SidebarsConfig = {
                                         
'sql-manual/sql-functions/scalar-functions/string-functions/instr',
                                         
'sql-manual/sql-functions/scalar-functions/string-functions/int-to-uuid',
                                         
'sql-manual/sql-functions/scalar-functions/string-functions/is-uuid',
+                                        
'sql-manual/sql-functions/scalar-functions/string-functions/is-valid-utf8',
                                         
'sql-manual/sql-functions/scalar-functions/string-functions/lcase',
                                         
'sql-manual/sql-functions/scalar-functions/string-functions/length',
                                         
'sql-manual/sql-functions/scalar-functions/string-functions/locate',
diff --git 
a/versioned_docs/version-4.x/sql-manual/sql-functions/scalar-functions/string-functions/is-valid-utf8.md
 
b/versioned_docs/version-4.x/sql-manual/sql-functions/scalar-functions/string-functions/is-valid-utf8.md
new file mode 100644
index 00000000000..b4c099ddfd0
--- /dev/null
+++ 
b/versioned_docs/version-4.x/sql-manual/sql-functions/scalar-functions/string-functions/is-valid-utf8.md
@@ -0,0 +1,146 @@
+---
+{
+    "title": "IS_VALID_UTF8",
+    "language": "en",
+    "description": "The IS_VALID_UTF8 function checks whether a string is 
valid UTF-8 encoded data. Returns true if the string is valid UTF-8, false 
otherwise."
+}
+---
+
+## Description
+
+The IS_VALID_UTF8 function checks whether a string is valid UTF-8 encoded 
data. It validates every byte sequence in the input and returns `true` if all 
sequences conform to the UTF-8 encoding standard, or `false` if any invalid 
byte sequence is found.
+
+This is useful when dealing with data imported from external sources (files, 
network streams, etc.) that may contain binary or incorrectly encoded content, 
and you need to verify data integrity before performing string operations.
+
+## Alias
+
+- `ISVALIDUTF8()`
+
+## Syntax
+
+```sql
+IS_VALID_UTF8(<str>)
+```
+
+## Parameters
+
+| Parameter | Description |
+|-----------|-------------|
+| `<str>` | The string to validate. Type: VARCHAR or STRING |
+
+## Return Value
+
+Returns BOOLEAN type.
+
+- Returns `true` if the string is valid UTF-8 encoded data.
+- Returns `false` if the string contains any invalid UTF-8 byte sequence.
+
+Special cases:
+- If the parameter is NULL, returns NULL.
+- An empty string is considered valid UTF-8, returns `true`.
+
+## Examples
+
+1. Valid ASCII strings
+
+```sql
+SELECT IS_VALID_UTF8('hello');
+```
+
+```text
++------------------------+
+| is_valid_utf8('hello') |
++------------------------+
+|                      1 |
++------------------------+
+```
+
+2. Valid multi-byte UTF-8 characters (Chinese)
+
+```sql
+SELECT IS_VALID_UTF8('Hello, 世界');
+```
+
+```text
++-----------------------------+
+| is_valid_utf8('Hello, 世界') |
++-----------------------------+
+|                           1 |
++-----------------------------+
+```
+
+3. Empty string
+
+```sql
+SELECT IS_VALID_UTF8('');
+```
+
+```text
++--------------------+
+| is_valid_utf8('')  |
++--------------------+
+|                  1 |
++--------------------+
+```
+
+4. Invalid UTF-8 bytes (constructed via UNHEX)
+
+```sql
+SELECT IS_VALID_UTF8(UNHEX('C0AF'));
+```
+
+```text
++------------------------------+
+| is_valid_utf8(unhex('C0AF')) |
++------------------------------+
+|                            0 |
++------------------------------+
+```
+
+5. NULL value handling
+
+```sql
+SELECT IS_VALID_UTF8(NULL);
+```
+
+```text
++---------------------+
+| is_valid_utf8(NULL) |
++---------------------+
+|                NULL |
++---------------------+
+```
+
+6. Using with table data
+
+```sql
+CREATE TABLE test_utf8 (
+    id INT,
+    val VARCHAR(200)
+) DISTRIBUTED BY HASH(id) BUCKETS 1
+PROPERTIES ("replication_num" = "1");
+
+INSERT INTO test_utf8 VALUES
+(1, 'hello'),
+(2, ''),
+(3, 'Hello, 世界'),
+(4, NULL);
+
+INSERT INTO test_utf8 VALUES (5, UNHEX('C0AF'));
+INSERT INTO test_utf8 VALUES (6, UNHEX('FF'));
+
+SELECT id, IS_VALID_UTF8(val) FROM test_utf8 ORDER BY id;
+```
+
+```text
++------+--------------------+
+| id   | is_valid_utf8(val) |
++------+--------------------+
+|    1 |                  1 |
+|    2 |                  1 |
+|    3 |                  1 |
+|    4 |               NULL |
+|    5 |                  0 |
+|    6 |                  0 |
++------+--------------------+
+```
diff --git a/versioned_sidebars/version-4.x-sidebars.json 
b/versioned_sidebars/version-4.x-sidebars.json
index 1c88549db0f..d3916b6459f 100644
--- a/versioned_sidebars/version-4.x-sidebars.json
+++ b/versioned_sidebars/version-4.x-sidebars.json
@@ -1662,6 +1662,7 @@
                         
"sql-manual/sql-functions/scalar-functions/string-functions/instr",
                         
"sql-manual/sql-functions/scalar-functions/string-functions/int-to-uuid",
                         
"sql-manual/sql-functions/scalar-functions/string-functions/is-uuid",
+                        
"sql-manual/sql-functions/scalar-functions/string-functions/is-valid-utf8",
                         
"sql-manual/sql-functions/scalar-functions/string-functions/lcase",
                         
"sql-manual/sql-functions/scalar-functions/string-functions/length",
                         
"sql-manual/sql-functions/scalar-functions/string-functions/locate",


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to