This is an automated email from the ASF dual-hosted git repository.

aloyszhang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/inlong-website.git


The following commit(s) were added to refs/heads/master by this push:
     new a7b95c23071 [INLONG-1009][Doc] Add InLongMsg format definition and 
usage doc (#1010)
a7b95c23071 is described below

commit a7b95c23071d0835717f94884421c0fb49d3af10
Author: Goson Zhang <4675...@qq.com>
AuthorDate: Sun Sep 29 09:55:34 2024 +0800

    [INLONG-1009][Doc] Add InLongMsg format definition and usage doc (#1010)
---
 docs/data_node/load_node/auto_consumption.md       |   2 +-
 .../binary_protocol/img/inlongmsg_frame.png        | Bin 0 -> 28605 bytes
 .../binary_protocol/img/inlongmsg_v1.png           | Bin 0 -> 27298 bytes
 .../binary_protocol/img/inlongmsg_v2.png           | Bin 0 -> 28570 bytes
 .../binary_protocol/img/inlongmsg_v3.png           | Bin 0 -> 29195 bytes
 .../binary_protocol/img/inlongmsg_v4.png           | Bin 0 -> 24826 bytes
 .../binary_protocol/img/inlongmsg_v4_bodydata.png  | Bin 0 -> 5355 bytes
 docs/development/binary_protocol/inlong_msg.md     | 191 ++++++++++++++++++++
 docs/development/inlong_msg.md                     |  50 ------
 .../data_node/load_node/auto_consumption.md        |   2 +-
 .../binary_protocol/img/inlongmsg_frame.png        | Bin 0 -> 28605 bytes
 .../binary_protocol/img/inlongmsg_v1.png           | Bin 0 -> 27298 bytes
 .../binary_protocol/img/inlongmsg_v2.png           | Bin 0 -> 28570 bytes
 .../binary_protocol/img/inlongmsg_v3.png           | Bin 0 -> 29195 bytes
 .../binary_protocol/img/inlongmsg_v4.png           | Bin 0 -> 24826 bytes
 .../binary_protocol/img/inlongmsg_v4_bodydata.png  | Bin 0 -> 5355 bytes
 .../development/binary_protocol/inlong_msg.md      | 195 +++++++++++++++++++++
 .../current/development/inlong_msg.md              |  50 ------
 18 files changed, 388 insertions(+), 102 deletions(-)

diff --git a/docs/data_node/load_node/auto_consumption.md 
b/docs/data_node/load_node/auto_consumption.md
index 847a2df1c14..972d8e71054 100644
--- a/docs/data_node/load_node/auto_consumption.md
+++ b/docs/data_node/load_node/auto_consumption.md
@@ -6,4 +6,4 @@ sidebar_position: 2
 ## Overview
 **Auto Consumption** meanings receive data from Message Queue Services (TubeMQ 
or Pulsar) directly, you can consume the message from MQ
 by [Pulsar SDK 
Client](https://pulsar.apache.org/docs/en/2.8.3/client-libraries/) or [TubeMQ 
SDK Client](modules/tubemq/clients_java.md),
-after that, you have to [Parse the InLongMsg](development/inlong_msg.md) to 
get raw data for forward processing.
\ No newline at end of file
+after that, you have to [Parse the 
InLongMsg](development/binary_protocol/inlong_msg.md) to get raw data for 
forward processing.
\ No newline at end of file
diff --git a/docs/development/binary_protocol/img/inlongmsg_frame.png 
b/docs/development/binary_protocol/img/inlongmsg_frame.png
new file mode 100644
index 00000000000..0c57142eedb
Binary files /dev/null and 
b/docs/development/binary_protocol/img/inlongmsg_frame.png differ
diff --git a/docs/development/binary_protocol/img/inlongmsg_v1.png 
b/docs/development/binary_protocol/img/inlongmsg_v1.png
new file mode 100644
index 00000000000..fdb1a1c1932
Binary files /dev/null and 
b/docs/development/binary_protocol/img/inlongmsg_v1.png differ
diff --git a/docs/development/binary_protocol/img/inlongmsg_v2.png 
b/docs/development/binary_protocol/img/inlongmsg_v2.png
new file mode 100644
index 00000000000..694025c796f
Binary files /dev/null and 
b/docs/development/binary_protocol/img/inlongmsg_v2.png differ
diff --git a/docs/development/binary_protocol/img/inlongmsg_v3.png 
b/docs/development/binary_protocol/img/inlongmsg_v3.png
new file mode 100644
index 00000000000..11207719f50
Binary files /dev/null and 
b/docs/development/binary_protocol/img/inlongmsg_v3.png differ
diff --git a/docs/development/binary_protocol/img/inlongmsg_v4.png 
b/docs/development/binary_protocol/img/inlongmsg_v4.png
new file mode 100644
index 00000000000..48e8284f9f2
Binary files /dev/null and 
b/docs/development/binary_protocol/img/inlongmsg_v4.png differ
diff --git a/docs/development/binary_protocol/img/inlongmsg_v4_bodydata.png 
b/docs/development/binary_protocol/img/inlongmsg_v4_bodydata.png
new file mode 100644
index 00000000000..2a10737e27f
Binary files /dev/null and 
b/docs/development/binary_protocol/img/inlongmsg_v4_bodydata.png differ
diff --git a/docs/development/binary_protocol/inlong_msg.md 
b/docs/development/binary_protocol/inlong_msg.md
new file mode 100644
index 00000000000..d1f157b8bc5
--- /dev/null
+++ b/docs/development/binary_protocol/inlong_msg.md
@@ -0,0 +1,191 @@
+---
+title: InLongMsg format definition and usage
+sidebar_position: 1
+---
+
+import {siteVariables} from '../../version';
+
+## Overview
+
+Users report data to the InLong system through SDK, HTTP, Agent and other data 
reporting methods. InLong's DataProxy component packages the received data into 
the `InLongMsg` format and stores it in the message body of the MQ message. 
After consuming data from MQ, users need to decode it according to the 
`InLongMsg` format to obtain the original reported data. This article mainly 
introduces the data structure of the `InLongMsg` format and how users parse 
this type of data after receiving it.
+
+## InLongMsg data format
+
+### Format framework
+
+InLongMsg is a binary data packet in a custom format, which consists of a 
formatted payload information encapsulated by the same magic number (Magic) of 
2 bytes at the front and back, as shown in the following figure:
+
+![InLongMsg frame](img/inlongmsg_frame.png)
+
+The Magic field has 4 valid values in the current implementation of InLongMsg, 
which respectively identify 4 different data versions that can be carried in 
the Payload part (MAGIC0 is an invalid value):
+
+```java
+    private static final byte[] MAGIC0 = {(byte) 0xf, (byte) 0x0};
+    private static final byte[] MAGIC1 = {(byte) 0xf, (byte) 0x1};
+    private static final byte[] MAGIC2 = {(byte) 0xf, (byte) 0x2};
+    private static final byte[] MAGIC3 = {(byte) 0xf, (byte) 0x3};
+    private static final byte[] MAGIC4 = {(byte) 0xf, (byte) 0x4};
+```
+The Payload part carries data content in the corresponding format according to 
the definition of the above Magic field. Regardless of the format used, these 
contents are ultimately mapped to the original data information reported by the 
user according to {attribute set, single data} or {attribute set, multiple 
data}.
+Next, we begin to introduce the corresponding Payload definitions according to 
different Magic version values.
+
+### InLongMsg V1
+
+For the InLongMsg V1 format, the Magic field value is 0x0f01. In this value, 
the Payload format is as shown below:
+
+![InLongMsg V1](img/inlongmsg_v1.png)
+
+Among them:
+ 
+- CreatTime: field identifies the construction time of the InLogMsg message;
+
+- AttrDataCnt: field identifies how many {attribute, data} pairs are carried 
in the message;
+
+AttrDataCnt The following information is stored in pairs of {attribute, data}
+
+- AttrLen, AttrData: The fields define the length and value of the attribute 
information;
+ 
+- ItemsLen: The field identifies the entire data length information contained 
in the attribute, and this field contains the length information of the 
following Compress field;
+
+- Compress: The field identifies whether the following data part is 
compressed. If it is compressed, it is organized in the following format after 
decompression. InLongMsg currently only supports Snappy data compression;
+
+Since the attribute may carry multiple data, the data part needs to support 
multiple data:
+
+- ItemLen: field identifies the length of the data item;
+
+- ItemData: field identifies the data value.
+
+### InLongMsg V2
+
+For the InLongMsg V2 format, the Magic field value is 0x0f02. When this value 
is used, the Payload format is as shown in the following figure:
+
+![InLongMsg V2](img/inlongmsg_v2.png)
+
+Compared with the InLongMsg V1 format, the meanings of the other fields of the 
InLongMsg V2 format are the same as those of the V1 format except for the newly 
added MsgCnt and ItemCnt fields:
+
+- MsgCnt: used to identify the total number of data items carried by the 
message;
+
+- ItemCnt: used to identify the total number of data items in the {attribute, 
data} pair information.
+
+### InLongMsg V3
+
+For the InLongMsg V3 format, the Magic field value is 0x0f03. When this value 
is used, the Payload format is as shown in the following figure:
+
+![InLongMsg V3](img/inlongmsg_v3.png)
+
+Compared with InLongMsg V1 and V2 formats, InLongMsg V3 format mainly solves 
the data reporting situation of {attribute set, multiple data} in the 
information, and each data carries private attributes. In the V3 format 
definition, it is completed by adding data private attribute fields to each 
data part, as follows:
+
+- RecordLen: used to identify the total length of a single data record;
+
+- IAttrLen: used to identify the length of the private attribute carried by a 
single data;
+
+- IitemAttr: used to identify the private attribute data value carried by a 
single data.
+
+### InLongMsg V4
+
+For the InLongMsg V4 format, the Magic field value is 0x0f04. When this value 
is used, the Payload format is as shown in the following figure:
+
+![InLongMsg V4](img/inlongmsg_v4.png)
+
+Compared with the previous InLongMsg V1, V2, and V3 format definitions, 
InLongMsg V4 has two improvements:
+
+1. The fixed fields in the common attributes are extracted from the attribute 
key-value pairs and saved as fixed fields, thereby reducing the total message 
length;
+
+2. Different bits of some fixed fields carry different values to indicate 
different function activations or type definitions.
+
+The relevant fields are defined as follows:
+
+- TotalLen: identifies the total length of the entire message;
+
+- MsgType: This field is a composite field that indicates the type and 
compression type of the message. The lower 5 bits indicate the message type, 
and the upper 3 bits indicate the compression method. Different bits indicate 
different meanings;
+
+- GroupId: identifies the ID value corresponding to the group, used when 
transmitting digital group information;
+
+- StreamId: identifies the ID value corresponding to the stream, used when 
transmitting digital stream information;
+
+- ExtField: identifies the extended function enabling field, used to transmit 
the extended function enabled by the message, and different bits indicate 
different meanings. For details, see the ExtField bit definition table;
+
+- DataTime: identifies the data time, with precision in seconds;
+
+- MsgCnt: identifies the total number of messages carried;
+
+- UniqueId: identifies the unique tag of the 8-byte long type of the message;
+
+- BodyLen: identifies the total length of the message body, and identifies the 
length of the following binary message body data;
+
+- BodyData: identifies the binary message content carried by the message;
+
+- AttrLen: identifies the attribute length;
+
+- AttrData: identifies the attribute value content.
+
+For ExtField field, each bit is defined as follows:
+
+| Bit | Meaning                                                   | Remark     
                                  |
+|-----|-----------------------------------------------------------|----------------------------------------------|
+| 0   | reserved                                                  |            
                                  |
+| 1   | Whether each data contains private attributes             | 1 
indicates inclusion, 0 indicates exclusion |
+| 2   | Whether to enable digital group, stream                   | 0 
indicates enabled, 1 indicates not enabled |
+| 3   | reserved                                                  |            
                                  |
+| 4   | reserved                                                  |            
                                  |
+| 5   | Whether multiple data are separated by newline characters | 1 
indicates enabled, 0 indicates not enabled |
+| 6   | reserved                                                  |            
                                  |
+| 7   | reserved                                                  |            
                                  |
+
+For BodyData field value, the format is as follows:
+
+![InLongMsg V4 BodyData](img/inlongmsg_v4_bodydata.png)
+
+- ItemLen: identifies the data length;
+
+- ItemData: identifies the data value;
+
+- IAttrLen: identifies the private attribute length;
+
+- IitemAttr: identifies the private attribute value.
+
+## Parsing messages of type InLongMsg
+
+The data consumed directly from InLong's message queue (InLong TubeMQ or 
Pulsar), you need to parse InLongMsg first. You can parse the source data in 
the following ways.
+
+### Add Maven dependency
+
+<pre><code parentName="pre">
+{`<dependency>
+    <groupId>org.apache.inlong</groupId>
+    <artifactId>inlong-common</artifactId>
+    <version>${siteVariables.inLongVersion}</version>
+</dependency>
+`}
+</code></pre>
+
+### Add Parse Method
+
+```java
+public static List<byte[]> parserInLongMsg(byte[] bytes) {
+    List<byte[]> originalContentByteList = new ArrayList<>();
+    InLongMsg inLongMsg = InLongMsg.parseFrom(bytes);
+    Set<String> attrs = inLongMsg.getAttrs();
+    if (CollectionUtils.isEmpty(attrs)) {
+        return originalContentByteList;
+    }
+    for (String attr : attrs) {
+        if (attr == null) {
+            continue;
+        }
+        Iterator<byte[]> iterator = inLongMsg.getIterator(attr);
+        if (iterator == null) {
+            continue;
+        }
+        while (iterator.hasNext()) {
+            byte[] bodyBytes = iterator.next();
+            if (bodyBytes == null || bodyBytes.length == 0) {
+                continue;
+            }
+            // Origin data sent by InLong reporter
+            originalContentByteList.add(bodyBytes);
+        }
+    }
+    return originalContentByteList;
+}
+```
\ No newline at end of file
diff --git a/docs/development/inlong_msg.md b/docs/development/inlong_msg.md
deleted file mode 100644
index bbee8ed10a7..00000000000
--- a/docs/development/inlong_msg.md
+++ /dev/null
@@ -1,50 +0,0 @@
----
-title: Parse InLongMsg
-sidebar_position: 1
----
-
-import {siteVariables} from '../version';
-
-## Overview
-If you consume data directly from a message queue (InLong TubeMQ or Pulsar), 
you need to parse `InLongMsg` first. Origin data can be parsed in the following 
ways.
-
-## Dependency
-- Add Maven Dependency
-<pre><code parentName="pre">
-{`<dependency>
-    <groupId>org.apache.inlong</groupId>
-    <artifactId>inlong-common</artifactId>
-    <version>${siteVariables.inLongVersion}</version>
-</dependency>
-`}
-</code></pre>
-
-- Add Parse Method
-```java
-public static List<byte[]> parserInLongMsg(byte[] bytes) {
-    List<byte[]> originalContentByteList = new ArrayList<>();
-    InLongMsg inLongMsg = InLongMsg.parseFrom(bytes);
-    Set<String> attrs = inLongMsg.getAttrs();
-    if (CollectionUtils.isEmpty(attrs)) {
-        return originalContentByteList;
-    }
-    for (String attr : attrs) {
-        if (attr == null) {
-            continue;
-        }
-        Iterator<byte[]> iterator = inLongMsg.getIterator(attr);
-        if (iterator == null) {
-            continue;
-        }
-        while (iterator.hasNext()) {
-            byte[] bodyBytes = iterator.next();
-            if (bodyBytes == null || bodyBytes.length == 0) {
-                continue;
-            }
-            // Origin data sended by InLong agent
-            originalContentByteList.add(bodyBytes);
-        }
-    }
-    return originalContentByteList;
-}
-```
\ No newline at end of file
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/load_node/auto_consumption.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/load_node/auto_consumption.md
index f24a22899a6..85559588484 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/load_node/auto_consumption.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/load_node/auto_consumption.md
@@ -5,4 +5,4 @@ sidebar_position: 2
 
 ## Overview
 **自主消费** 是指直接从消息队列服务中 (TubeMQ or Pulsar) 消费数据, 你可以使用 [Pulsar SDK 
Client](https://pulsar.apache.org/docs/en/2.8.3/client-libraries/) 或者 [TubeMQ 
SDK Client](modules/tubemq/clients_java.md) 进行消费, 
-获取到数据后,需要通过 [解析 InLongMsg](development/inlong_msg.md) 获取原数据进行下一步处理。
\ No newline at end of file
+获取到数据后,需要通过 [解析 InLongMsg](development/binary_protocol/inlong_msg.md) 
获取原数据进行下一步处理。
\ No newline at end of file
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/development/binary_protocol/img/inlongmsg_frame.png
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/development/binary_protocol/img/inlongmsg_frame.png
new file mode 100644
index 00000000000..0c57142eedb
Binary files /dev/null and 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/development/binary_protocol/img/inlongmsg_frame.png
 differ
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/development/binary_protocol/img/inlongmsg_v1.png
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/development/binary_protocol/img/inlongmsg_v1.png
new file mode 100644
index 00000000000..fdb1a1c1932
Binary files /dev/null and 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/development/binary_protocol/img/inlongmsg_v1.png
 differ
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/development/binary_protocol/img/inlongmsg_v2.png
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/development/binary_protocol/img/inlongmsg_v2.png
new file mode 100644
index 00000000000..694025c796f
Binary files /dev/null and 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/development/binary_protocol/img/inlongmsg_v2.png
 differ
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/development/binary_protocol/img/inlongmsg_v3.png
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/development/binary_protocol/img/inlongmsg_v3.png
new file mode 100644
index 00000000000..11207719f50
Binary files /dev/null and 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/development/binary_protocol/img/inlongmsg_v3.png
 differ
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/development/binary_protocol/img/inlongmsg_v4.png
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/development/binary_protocol/img/inlongmsg_v4.png
new file mode 100644
index 00000000000..48e8284f9f2
Binary files /dev/null and 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/development/binary_protocol/img/inlongmsg_v4.png
 differ
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/development/binary_protocol/img/inlongmsg_v4_bodydata.png
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/development/binary_protocol/img/inlongmsg_v4_bodydata.png
new file mode 100644
index 00000000000..2a10737e27f
Binary files /dev/null and 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/development/binary_protocol/img/inlongmsg_v4_bodydata.png
 differ
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/development/binary_protocol/inlong_msg.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/development/binary_protocol/inlong_msg.md
new file mode 100644
index 00000000000..c872df69007
--- /dev/null
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/development/binary_protocol/inlong_msg.md
@@ -0,0 +1,195 @@
+---
+title: InLongMsg 格式定义及使用
+sidebar_position: 1
+---
+
+import {siteVariables} from '../../version';
+
+## 概述
+
+用户通过 SDK、HTTP、Agent 等数据上报方式将数据上报到 InLong 系统,InLong 的 DataProxy 组件将接收到的数据打包成 
`InLongMsg` 格式并存储到 MQ 消息的消息体里。用户从 MQ 消费数据后需要按照 `InLongMsg` 
格式解码才能获得原始上报数据。本文主要介绍 `InLongMsg` 格式的数据结构,以及用户收到这类数据后如何解析。
+
+## InLongMsg 数据格式
+
+### 格式框架
+
+InLongMsg 是自定义格式的二进制数据包,由前后各 2 个字节的相同魔术数字(Magic)封装带格式的承载(Payload)信息组成,如下图示:
+
+![InLongMsg frame](img/inlongmsg_frame.png)
+
+Magic 字段在 InLongMsg 的当前实现里一共有 4 个有效值,分别标识 Payload 部分可携带的 4 种不同的数据版本(MAGIC0 
为无效值):
+
+```java
+    private static final byte[] MAGIC0 = {(byte) 0xf, (byte) 0x0};
+    private static final byte[] MAGIC1 = {(byte) 0xf, (byte) 0x1};
+    private static final byte[] MAGIC2 = {(byte) 0xf, (byte) 0x2};
+    private static final byte[] MAGIC3 = {(byte) 0xf, (byte) 0x3};
+    private static final byte[] MAGIC4 = {(byte) 0xf, (byte) 0x4};
+```
+Payload 部分根据上述 Magic 字段的定义携带对应格式的数据内容,这些内容不论采用什么样的格式最终都映射为用户按照 {属性集合,单条数据},或者 
{属性集合,多条数据} 上报的原始数据信息。
+
+接下来我们按照不同的 Magic 版本值介绍对应的 Payload 定义。
+
+### InLongMsg V1
+
+对于 InLongMsg V1 格式 Magic 字段值为 0x0f01,在该值时 Payload 部分格式如下图所示:
+
+![InLongMsg V1](img/inlongmsg_v1.png)
+
+其中:
+
+- CreatTime: 字段标识该条 InLogMsg 消息的构造时间;
+
+- AttrDataCnt: 字段标识该条消息里携带了多少个 {属性,数据} 对;
+  
+AttrDataCnt 接下来的信息则逐条存储 {属性,数据} 对信息
+
+- AttrLen, AttrData: 字段定义属性信息的长度及值;
+
+- ItemsLen: 字段标识该属性包含的整个数据长度信息,该字段包含接下来的 Compress 字段长度信息;
+
+- Compress: 字段标识紧接着的数据部分是否被压缩,如果被压缩则解压后按照接下来的格式进行组织,InLongMsg 目前仅支持 Snappy 
数据压缩方式;
+
+由于属性携带的数据可能是多条,因而数据部分要支持多条数据的情况:
+
+- ItemLen: 字段标识该项数据的长度;
+
+- ItemData: 字段标识数据值。
+
+### InLongMsg V2
+
+对于 InLongMsg V2 格式 Magic 字段值为 0x0f02,在该值时 Payload 部分格式如下图所示:
+
+![InLongMsg V2](img/inlongmsg_v2.png)
+
+相比 InLongMsg V1 格式,InLongMsg V2 格式除了新增的 MsgCnt、ItemCnt 字段外,其他字段含义与 V1 格式定义相同:
+
+- MsgCnt: 用来标识该消息携带的数据总条数;
+
+- ItemCnt:用来标识该 {属性,数据} 对信息里总数据个数。
+
+### InLongMsg V3
+
+对于 InLongMsg V3 格式 Magic 字段值为 0x0f03,在该值时 Payload 部分格式如下图所示:
+
+![InLongMsg V3](img/inlongmsg_v3.png)
+
+相比 InLongMsg V1 和 V2 格式,InLongMsg V3 格式主要解决 {属性集合,多条数据} 
对信息里,每条数据携带私有属性的数据上报情况,在 V3 格式定义里通过在每个数据部分增加数据私有属性字段来完成,具体如下:
+
+- RecordLen:用来标识单项数据记录总长度;
+
+- IAttrLen:用来标识单项数据携带的私有属性长;
+
+- IitemAttr:用来标识单项数据携带的私有属性数据值。
+
+
+### InLongMsg V4
+
+对于 InLongMsg V4 格式 Magic 字段值为 0x0f04,在该值时 Payload 部分格式如下图所示:
+
+![InLongMsg V4](img/inlongmsg_v4.png)
+
+相比之前的 InLongMsg V1,V2,V3 格式定义,InLongMsg V4 有 2 点改进:
+
+1. 将公共属性里的固定字段从属性键值对里抽取出来以固定字段形式保存,从而减小总的消息长度;
+
+2. 通过将部分固定字段的不同位携带不同的值标识不同的功能开启或类型定义。
+
+相关字段定义如下:
+
+- TotalLen:标识整个消息总长度;
+
+- MsgType:该字段是一个复合字段,标明消息的类型和压缩类型。其中低 5 位表示消息类型,高 3 位表示压缩方式,不同位标识不同含义;
+
+- GroupId:标识 group 对应的 ID 值,传递数字 group 信息时使用;
+
+- StreamId:标识 stream 对应的 ID 值,传递数字 stream 信息时使用;
+
+- ExtField:标识扩展功能启用字段,用来传递消息启用的扩展功能,不同位标识不同含义,具体参见 ExtField 各位定义表;
+
+- DataTime:标识数据时间,精度秒;
+
+- MsgCnt:标识携带的消息总条数;
+
+- UniqueId:标识消息 8 字节 long 型的唯一标记;
+
+- BodyLen:标识消息体总长度,标识接下来的二进制消息体数据长度;
+
+- BodyData:标识该消息携带的二进制消息内容;
+
+- AttrLen:标识属性长度;
+
+- AttrData:标识属性值内容。
+  
+对于 ExtField 字段,各位定义如下:
+
+| 位 | 含义                  | 备注             |
+|---|---------------------|----------------|
+| 0 | 保留                  |                |
+| 1 | 每个数据是否包含私有属性        | 1 标识包含,0 标识不包含 |
+| 2 | 是否启用数字 group,stream | 0 标识启用,1 标识不启用 |
+| 3 | 保留                  |                |
+| 4 | 保留                  |                |
+| 5 | 多条数据是否启用按换行符分隔      | 1 标识启用,0 标识不启用 |
+| 6 | 保留                  |                |
+| 7 | 保留                  |                |
+
+
+对于 BodyData 字段值,其格式如下:
+
+![InLongMsg V4 BodyData](img/inlongmsg_v4_bodydata.png)
+
+- ItemLen:标识数据长度;
+
+- ItemData:标识数据值;
+
+- IAttrLen:标识私有属性长度;
+
+- IitemAttr:标识私有属性值。
+
+
+## 解析 InLongMsg 类型的消息
+
+直接从 InLong 的消息队列(InLong TubeMQ 或 Pulsar)消费数据,需要先对`InLongMsg` 
进行解析。可通过以下方式可以解析出源数据。
+
+### 增加 maven 依赖
+
+<pre><code parentName="pre">
+{`<dependency>
+    <groupId>org.apache.inlong</groupId>
+    <artifactId>inlong-common</artifactId>
+    <version>${siteVariables.inLongVersion}</version>
+</dependency>
+`}
+</code></pre>
+
+### 增加解析逻辑
+
+```java
+public static List<byte[]> parserInLongMsg(byte[] bytes) {
+    List<byte[]> originalContentByteList = new ArrayList<>();
+    InLongMsg inLongMsg = InLongMsg.parseFrom(bytes);
+    Set<String> attrs = inLongMsg.getAttrs();
+    if (CollectionUtils.isEmpty(attrs)) {
+        return originalContentByteList;
+    }
+    for (String attr : attrs) {
+        if (attr == null) {
+            continue;
+        }
+        Iterator<byte[]> iterator = inLongMsg.getIterator(attr);
+        if (iterator == null) {
+            continue;
+        }
+        while (iterator.hasNext()) {
+            byte[] bodyBytes = iterator.next();
+            if (bodyBytes == null || bodyBytes.length == 0) {
+                continue;
+            }
+            // 上报方发送的原始用户数据
+            originalContentByteList.add(bodyBytes);
+        }
+    }
+    return originalContentByteList;
+}
+```
\ No newline at end of file
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/development/inlong_msg.md 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/development/inlong_msg.md
deleted file mode 100644
index 5a0df85a5d5..00000000000
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/development/inlong_msg.md
+++ /dev/null
@@ -1,50 +0,0 @@
----
-title: 解析 InLongMsg
-sidebar_position: 1
----
-
-import {siteVariables} from '../version';
-
-## 总览
-如果直接从消息队列(InLong TubeMQ 或Pulsar)消费数据,需要先对`InLongMsg` 进行解析。可通过以下方式可以解析出源数据。
-
-## 解析
-- 增加maven 依赖
-<pre><code parentName="pre">
-{`<dependency>
-    <groupId>org.apache.inlong</groupId>
-    <artifactId>inlong-common</artifactId>
-    <version>${siteVariables.inLongVersion}</version>
-</dependency>
-`}
-</code></pre>
-
-- 增加解析逻辑
-```java
-public static List<byte[]> parserInLongMsg(byte[] bytes) {
-    List<byte[]> originalContentByteList = new ArrayList<>();
-    InLongMsg inLongMsg = InLongMsg.parseFrom(bytes);
-    Set<String> attrs = inLongMsg.getAttrs();
-    if (CollectionUtils.isEmpty(attrs)) {
-        return originalContentByteList;
-    }
-    for (String attr : attrs) {
-        if (attr == null) {
-            continue;
-        }
-        Iterator<byte[]> iterator = inLongMsg.getIterator(attr);
-        if (iterator == null) {
-            continue;
-        }
-        while (iterator.hasNext()) {
-            byte[] bodyBytes = iterator.next();
-            if (bodyBytes == null || bodyBytes.length == 0) {
-                continue;
-            }
-            // agent 发送的原始用户数据
-            originalContentByteList.add(bodyBytes);
-        }
-    }
-    return originalContentByteList;
-}
-```
\ No newline at end of file

Reply via email to