This is an automated email from the ASF dual-hosted git repository. gosonzhang pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/inlong-website.git
The following commit(s) were added to refs/heads/master by this push: new c6079e594c [INLONG-895][Doc] Improve HTTP report documentation (#896) c6079e594c is described below commit c6079e594ca044ad318c0f55807b5d8c612ce788 Author: Goson Zhang <4675...@qq.com> AuthorDate: Thu Nov 30 19:24:23 2023 +0800 [INLONG-895][Doc] Improve HTTP report documentation (#896) --- docs/sdk/dataproxy-sdk/http.md | 16 ++++++++++++++++ docs/sdk/dataproxy-sdk/img/http_report.png | Bin 0 -> 166652 bytes .../current/sdk/dataproxy-sdk/http.md | 20 ++++++++++++++++++++ .../current/sdk/dataproxy-sdk/img/http_report.png | Bin 0 -> 166652 bytes 4 files changed, 36 insertions(+) diff --git a/docs/sdk/dataproxy-sdk/http.md b/docs/sdk/dataproxy-sdk/http.md index 159884836c..98407553c8 100644 --- a/docs/sdk/dataproxy-sdk/http.md +++ b/docs/sdk/dataproxy-sdk/http.md @@ -3,9 +3,25 @@ title: HTTP Report sidebar_position: 3 --- +## Introduction to the HTTP Reporting Process +InLong processes HTTP report messages through DataProxy nodes:the reporting source periodically obtains the access point list from the Manager, and then selects available HTTP reporting nodes from the access point list based on its own strategy, after that uses the HTTP protocol for data production. The overall HTTP reporting process is illustrated in the following diagram: + + + +- Heartbeat reporting: DataProxy periodically reports heartbeats to the Manager, providing information about the enabled access points, including {IP, Port, Protocol, Load}. +- Online node caching: The Manager caches the heartbeat information reported by DataProxy, sensing the available access nodes in the cluster and the available reporting access information. +- Access point acquisition: The HTTP SDK (either an HttpProxySender implemented by DataProxy-SDK or an HTTP reporting SDK developed according to the HTTP reporting protocol) periodically obtains the available reporting access point list information for the current groupId by calling the "/inlong/manager/openapi/dataproxy/getIpList/{inlongGroupId}" method from the Manager. +- Access point selection: The HTTP SDK selects the DataProxy node for message reporting based on the reporting node selection strategy. +- Data reporting: The HTTP SDK constructs the reporting message according to the HTTP reporting protocol, sends the request message to the selected DataProxy node, and performs actions such as resending or exception output based on the response result after receiving the response. +- Data acceptance: DataProxy checks the HTTP message. If the message is successfully accepted, it returns a success response and forwards the message to the MQ cluster. If the message format or value does not meet the specifications, or if the message processing fails, DataProxy returns a failure response with the corresponding error code and detailed error information. + +Suggestion: +Due to the issues of low performance, low proportion of valid data, and the ease of losing request messages in HTTP reporting, it is recommended for businesses to prioritize using the TCP method for data reporting. + ## Create real-time synchronization task Create a task on the Dashboard or through the command line, and use `Auto Push` (autonomous push) as the data source type. + ## Method 1: Call the interface to report (CURL) ```bash curl -X POST -d 'groupId=give_your_group_id&streamId=give_your_stream_id&dt=data_time&body=give_your_data_body&cnt=1' http://dataproxy_url:46802/dataproxy/message diff --git a/docs/sdk/dataproxy-sdk/img/http_report.png b/docs/sdk/dataproxy-sdk/img/http_report.png new file mode 100644 index 0000000000..7b49d8641b Binary files /dev/null and b/docs/sdk/dataproxy-sdk/img/http_report.png differ diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sdk/dataproxy-sdk/http.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sdk/dataproxy-sdk/http.md index 427f5712ae..b2b07d422d 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sdk/dataproxy-sdk/http.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sdk/dataproxy-sdk/http.md @@ -3,6 +3,26 @@ title: HTTP 上报 sidebar_position: 3 --- +## HTTP 上报流程介绍 +InLong 通过 DataProxy 节点处理 HTTP 上报消息,上报源定期从 Manager 获取接入点列表,然后根据自身策略从接入点列表里选择可用的 HTTP 上报节点,再采用 HTTP 协议进行数据生产。总的 HTTP 上报流程如下图示: + + + +- 心跳上报:DataProxy 定期上报心跳至 Manager,提供该节点已启用接入的 {IP,Port,Protocol,Load} 信息; + +- 在线节点缓存:Manager 缓存 DataProxy 上报的心跳信息,感知集群里可用的接入节点,以及可用的上报接入信息; + +- 接入点获取:HTTP SDK(数据上报源采用 DataProxy-SDK 实现的 HttpProxySender,或者据 HTTP 上报协议自行开发的 HTTP 上报 SDK)定期通过“/inlong/manager/openapi/dataproxy/getIpList/{inlongGroupId}”方法从 Manager 获取当前上报的groupId对应的可用上报接入点列表信息; + +- 接入点选取:HTTP SDK 根据上报节点选取策略,选择待进行消息上报的 DataProxy 节点; + +- 数据上报:HTTP SDK 根据 HTTP 上报协议构造上报消息,向选中的 DataProxy 节点发送请求消息,并在收到响应后根据响应结果做是否重发、异常输出等操作; + +- 数据接纳:DataProxy 检查 HTTP 消息,如果成功接纳则返回成功响应,并将消息转发给 MQ 集群;如果消息格式或者数值不符合规范,或者消息处理失败,则 DataProxy 返回失败响应,响应里携带对应的错误码和详细的错误信息。 + +建议: + 由于 HTTP 上报存在性能低、有效数据占比低、请求消息容易丢失等问题,建议业务尽量用 TCP 方式进行数据上报。 + ## 新建实时同步任务 在 Dashboard 或者通过命令行工具创建任务,数据源类型使用 `Auto Push` (自主推送)。 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sdk/dataproxy-sdk/img/http_report.png b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sdk/dataproxy-sdk/img/http_report.png new file mode 100644 index 0000000000..7b49d8641b Binary files /dev/null and b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sdk/dataproxy-sdk/img/http_report.png differ