Re: [PR] [INLONG-962][Release] Add blog for the 1.13.0 release [inlong-website]

via GitHub Thu, 01 Aug 2024 02:08:47 -0700


fuweng11 commented on code in PR #965:
URL: https://github.com/apache/inlong-website/pull/965#discussion_r1699778760



##########
blog/2024-07-18-release-1.13.0.md:
##########
@@ -0,0 +1,142 @@
+---
+title: Release 1.13.0
+author: Wenkai Fu
+author_url: https://github.com/fuweng11
+author_image_url: https://avatars.githubusercontent.com/u/8108604?s=400&v=4
+tags: [Apache InLong, Version]
+---
+
+Apache InLong recently released version 1.13.0, which closed about 275+ 
issues, including 6+ major features and 100+ optimizations. The main features 
include Manager supports for agent install package management and it's 
self-upgrading processe, Agent ability for self-upgrading process, Agent 
ability for collecting data from Kafka、Pulsar and MongoDB, Support for Redis 
connector in Sort module, Optimization for Audit and enhancement of its 
capabilities
+. After the release of 1.13.0, Apache InLong has enriched and optimized Agent 
function scenarios, enhanced the accuracy of Audit data measurement, and 
enriched the capabilities and applicable scenarios of Sort, solved the demand 
for quick troubleshooting in development and operation, and optimized the user 
experience of Apache InLong operation and maintenance.
+<!--truncate-->
+
+## About Apache InLong
+
+As the industry's first one-stop, full-scenario, open-source massive data 
integration framework, Apache InLong provides automatic, safe, reliable, and 
high-performance data transmission capabilities to facilitate businesses to 
build stream-based data analysis, modeling, and applications quickly. At 
present, InLong is widely used in various industries such as advertising, 
payment, social networking, games, artificial intelligence, etc., serving 
thousands of businesses, among which the scale of high-performance scene data 
exceeds 1 trillion lines per day, and the scale of high-reliability scene data 
exceeds 10 trillion lines per day.
+
+The core keywords of InLong project positioning are "one-stop" and "massive 
data". For "one-stop", we hope to shield technical details, provide complete 
data integration and support services, and implement out-of-the-box; With its 
advantages, such as multi-cluster management, it can stably support 
larger-scale data volumes based on trillions of lines per day.
+
+## 1.13.0 Version Overview
+
+Apache InLong recently released version 1.13.0, which closed about 140+ 
issues, including 7+ major features and 90+ optimizations. The main features 
include Manager supports for agent install package management and it's 
self-upgrading processe, Agent ability for self-upgrading process, Agent 
ability for collecting data from Kafka、Pulsar and MongoDB, Support for Redis 
connector in Sort module, Optimization for Audit and enhancement of its 
capabilities
+. After the release of 1.13.0, Apache InLong has enriched and optimized Agent 
function scenarios, enhanced the accuracy of Audit data measurement, and 
enriched the capabilities and applicable scenarios of Sort, solved the demand 
for quick troubleshooting in development and operation, and optimized the user 
experience of Apache InLong operation and maintenance. In Apache InLong 1.13.0 
version, a large number of other features have also been completed, mainly 
including:
+
+### Agent Module
+- Support data version numbers to distinguish between normal data and 
supplementary data
+- Location storage supports plugins, currently supporting Rocksdb and Zookeeper
+- Support configuration version number comparison to prevent repeated 
configuration
+- Support minute level file collection
+- Add PostgreSQL and MongoDB data source collection
+
+### Manager Module
+- Support installing agents through SSH
+- Switch audit ID query from direct interaction with database to Audit SDK
+- Offline synchronization supports Pulsar -> MySQL
+- Support offline synchronous scheduling information management
+- File collection supports multi IP collection
+- Support obtaining Agent configuration information
+- Support automatic synchronization to Sink after modifying Stream field 
information
+- Support field template management
+- Data preview supports KV format
+- Data preview supports querying based on field filtering criteria
+
+### Dashboard Module
+- Add Source Data Field Template Page
+- Add monitoring and auditing page
+- Support installing agents through SSH key based authentication
+- Audit supports displaying total and variance audit data
+- File type data stream supports minute level cycles
+
+### Audit Module
+- Unified allocation and management of audit items using the Audit SDK
+- The Audit SDK supports automatic management of Audit Proxy addresses
+- Audit SDK optimization of TCP packet sticking leads to inaccurate audit 
reconciliation
+- Optimization of Audit SDK for Audit Item and Indicator Management Issues
+- Audit Store supports the universal JDBC protocol
+- Restarting the Audit Store optimization process may lead to data loss issues
+- Audit Store cleans up historical useless code
+- Audit Service optimizes thread pool management
+- Audit Service compatible with historical audit data with empty Audit Tag
+- Audit Service Optimization for OpenAPI Audit Transmission Delay Calculation
+- Audit Service OpenAPI supports querying historical hourly audit data from 
one day ago
+- Audit Service supports automatic management of partitions
+- Optimizing container environment variable conflicts
+
+### Sort Module
+- Supports using state key during StarRocks connector sinitialization
+- Supports parsing KV and CSV data containing split symbols
+- Using ZLIB as the default compression type for Pulsar Sink
+- Pulsar Connector supports authentication configuration
+- Pulsar Sink supports authentication configuration
+- Redis Source supports String, Hash, and ZSet data types
+- Redis Sink supports Bitmap, Hash, and String data types
+
+## 1.13.0 Version Feature Introduction
+
+### Manager supports installing Agent by SSH
+Through this feature, operation and maintenance personnel can install agents 
through the Dashboard, which currently supports SSH and manual installation 
methods. Users can create a new Agent cluster on the cluster management page.
+![1.13.0-agent-cluster.png](img%2F1.13.0-agent-cluster.png)
+Afterwards, enter the node, select the new node and configure the SSH username 
and password to achieve SSH installation agent capability. Thanks to @haifxu 
and @fuweng11. For more information, please refer to INLONG-10409.
+![1.13.0-agent-install.png](img%2F1.13.0-agent-install.png)
+
+### Manager supports field template management
+Through this feature, users can pre configure field templates, and when 
creating a new Stream, they can select the already configured field template, 
thereby achieving the goal of repeatedly configuring multiple Streams.
+Thanks to @kamianlaida and @fuweng11. For more information, please refer to: 
INLONG-10330.
+! [1.13.0-create-template.png](img%2F1.13.0-create-template.png)
+! [1.13.0-select-template.png](img%2F1.13.0-select-template.png)
+! [1.13.0-import-template.png](img%2F1.13.0-import-template.png)
+
+### InLong supports configuring offline synchronization tasks Pulsar > MySQL
+In version 1.13.0, Manager supports the configuration of offline 
synchronization tasks. Compared to real-time synchronization, offline data 
synchronization(not supported yet) pays more attention to synchronization 
throughput and efficiency.
+Real-time synchronization tasks run in the manner of Flink stream tasks, while 
offline synchronization runs in the manner of Flink batch tasks. This approach 
can ensure the consistency of real-time and offline synchronization tasks' code 
as much as possible, reducing maintenance costs.
+The offline synchronization function of InLong will be combined with the 
scheduling system to synchronize the complete or incremental data of the data 
source information to the data target. The offline synchronization task is 
created by InLong Manager (including scheduling information), and the specific 
data synchronization logic is implemented through the InLong Sort module.
+![1.13.0-manager-offline.png](img%2F1.13.0-manager-offline.png)
+Key Competency:
+- Job Configuration: Support Wizard Mode(Configuration through page wizard) 
and OpenAPI mode.
+- Scheduling Configuration: Support Wizard Mode(Configuration through page 
wizard) and OpenAPI mode
+- Job Type: Support Periodic Incremental Synchronization and Periodic Full 
Synchronization
+- Scheduling: Built-in simple periodic scheduling capability, complex 
capabilities such as task dependencies are supported by third-party scheduling 
systems.
+- Data Source: RMDB, Message Queue and Big data storage(Hive,StarRocks,Iceberg 
etc.)
+- Data Sink: RMDB, Message Queue and Big data storage(Hive,StarRocks,Iceberg 
etc.)
+- Compute Engine: Flink
+- Offline Job Operation and Maintenance: Job start,stop and running status 
monitoring
+- Special Handling: Dirty Data Processing Capability
+  The following is the core process:
+![1.13.0-dataflow-architecture.png](img%2F1.13.0-dataflow-architecture.png)
+
+### Optimize the Sort Standalone configuration process
+In version 1.13.0, the distribution process of Sort Standalone configuration 
was modified. In previous versions, there were the following issues with the 
distribution of Sort Standalone configuration:
+- Configuration changes are unreliable. Configuration changes will be updated 
in real-time to the manager's cache. Once the configuration is changed, Sort 
Standalone will sense it and write data based on the new configuration without 
verifying the configuration.
+- The configuration and construction process is repetitive and cumbersome. 
When pulling the Sort configuration from the database, it is a full pull and 
real-time build is performed after the pull is complete.
+In the new version, after modifying the data target, the Sort configuration 
will not take effect in real time, but will need to be built and written into 
the sort_config table after executing the workflow. The following figure shows 
a process comparison:
+![1.13.0-dataflow-architecture.png](img%2F1.13.0-dataflow-architecture.png)

Review Comment:
   Fixed.



##########
i18n/zh-CN/docusaurus-plugin-content-blog/2024-07-18-release-1.13.0.md:
##########
@@ -0,0 +1,145 @@
+---
+title: 1.13.0 版本发布
+author: Wenkai Fu
+author_url: https://github.com/fuweng11
+author_image_url: https://avatars.githubusercontent.com/u/8108604?s=400&v=4
+tags: [Apache InLong, Version]
+---
+
+Apache InLong（应龙）最近发布了 1.13.0 版本，该版本关闭了 275+ 个 Issues ，包含 7+ 个大特性和 90+ 
个优化，主要完成了 Manager 对 Agent 安装包的管理和自升级流程的管理、Agent 支持自升级流程、Agent 对 
Kafka/Pulsar/MongoDB 采集的支持、Audit 方案优化及能力增强、Sort 新增支持 Redis Connector 等特性。1.13.0 
发布后，Apache InLong 丰富并优化了 Agent 功能场景， 增强了 Audit 数据度量的准确性，丰富了 Sort 的能力和适用场景，同时优化了 
Apache InLong 运营、运维过程中遇到的一些问题和使用体验。
+<!--truncate-->
+
+## 关于 Apache InLong
+作为业界首个一站式、全场景海量数据集成框架，Apache 
InLong（应龙）提供了自动、安全、可靠和高性能的数据传输能力，方便业务快速构建基于流式的数据分析、建模和应用。目前 InLong 
正广泛应用于广告、支付、社交、游戏、人工智能等各个行业领域，服务上千个业务，其中高性能场景数据规模超百万亿条/天，高可靠场景数据规模超十万亿条/天。
+
+InLong 
项目定位的核心关键词是“一站式”、“全场景”和“海量数据”。对于“一站式”，我们希望屏蔽技术细节、提供完整数据集成及配套服务，实现开箱即用；对于“全场景”，我们希望提供全方位的解决方案，覆盖大数据领域常见的数据集成场景；对于“海量数据”，我们希望通过架构上的数据链路分层、全组件可扩展、自带多集群管理等优势，在百万亿条/天的基础上，稳定支持更大规模的数据量。
+
+## 1.13.0 版本总览
+Apache InLong（应龙）最近发布了 1.13.0 版本，该版本关闭了 275+ 个 Issues ，包含 6+ 个大特性和 100+ 
个优化，主要完成了 Manager 对 Agent 安装包的管理和自升级流程的管理、Agent 支持自升级流程、Agent 对 
Kafka/Pulsar/MongoDB 采集的支持、Audit 方案优化及能力增强、Sort 新增支持 Redis Connector 等特性。1.13.0 
发布后，Apache InLong 丰富并优化了 Agent 功能场景，增强了 Audit 数据度量的准确性，丰富了 Sort 的能力和适用场景，同时优化了 
Apache InLong 运营、运维过程中遇到的一些问题和使用体验。Apache InLong 1.13.0 版本中，还完成了大量其它特性，主要包括：
+
+### Agent 模块
+- 支持数据版本号，用于区分正常数据与补录数据
+- 位点存储支持插件化，目前支持 rocksdb 与 zookeeper
+- 支持配置版本号比对，防止配置反复
+- 支持分钟级文件采集
+- 增加 PostgreSQL 数据源采集
+
+### Manager 模块
+- 支持通过 SSH 的方式安装 Agent
+- 审计 ID 查询从直接与数据库交互切换至 Audit SDK
+- 离线同步支持 Pulsar -> MySQL 
+- 支持离线同步调度信息管理
+- 文件采集支持多 IP 采集
+- 支持获取 Agent 配置信息
+- 支持修改 Stream 字段信息后自动同步至 Sink
+- 支持字段模板管理
+- 数据预览支持 KV 格式
+- 数据预览支持根据字段过滤条件查询
+
+### Dashboard 模块
+- 新增源数据字段模板页面
+- 新增监控审计页面
+- 支持通过基于SSH密钥的身份验证安装Agent
+- 审计支持显示总计和差异的审计数据
+- 文件类型数据流支持分钟级周期
+
+### Audit 模块
+- Audit SDK 统一分配与管理审计项
+- Audit SDK 支持自动管理 Audit Proxy 地址
+- Audit SDK 优化tcp粘包导致审计对账不准的问题
+- Audit SDK 优化审计项与指标项管理的问题
+- Audit Store 支持通用 JDBC 协议
+- Audit Store 优化进程重启可能导致丢数据的问题
+- Audit Store 清理历史无用代码
+- Audit Service 优化线程池管理
+- Audit Service 兼容Audit Tag为空的历史审计数据
+- Audit Service 优化OpenAPI 审计传输时延的计算
+- Audit Service OpenAPI 支持查询一天前的历史小时审计数据
+- Audit Service 支持自动管理分区
+- 优化容器环境变量冲突的问题
+
+### Sort 模块
+- 新增 JDBC connector on flink 1.15
+- 新增 Pulsar connector on flink 1.18
+- Redis connector支持 上报审计信息
+- Kafka connector 支持上报审计信息
+- MongoDB connector 支持上报审计信息
+- PostgreSQL connector 支持上报审计信息
+- flink 版本 1.13.6 提升至 1.15.4
+
+
+### SDK 模块
+- 新增 DataProxy Python SDK
+- DataProxy Python SDK 增强Transform SDK SQL函数支持，新增8种算术函数(power, abs, sqrt, ln, 
log10, log2, log, exp)
+- DataProxy Go SDK 连接池支持动态均衡及故障节点恢复探测
+- DataProxy Go SDK 修复gnet初始化顺序的问题，避免升级到新版本gnet会阻塞
+- DataProxy Go SDK 潜在的阻塞问题，避免更新连接时阻塞
+
+## 1.13.0 版本特性介绍
+
+### Manager 支持对 SSH 安装 Agent
+通过此特性，运维人员可以通过 Dashboard 进行 Agent 的安装操作，目前支持通过 SSH 和手动安装的方式。用户可以在集群管理页面新建 
Agent 集群。
+![1.13.0-agent-cluster.png](img%2F1.13.0-agent-cluster.png)
+之后，进入节点，选择新建节点并配置好 SSH 用户名和密码后实现 SSH 安装 Agent 能力。感谢 @haifxu、@fuweng11 两位同学在 
Dashboard 及 Manager 部分对此功能的贡献。具体可参考：INLONG-10409。
+![1.13.0-agent-install.png](img%2F1.13.0-agent-install.png)
+
+### Manager 支持字段模板管理能力
+通过此特性，用户可以事先配置好字段模板，在新建 Stream 时，可以选择已配置好的字段模板，从而达到多个 Stream 重复配置的目的。
+感谢 @haifxu、@fuweng11 两位同学在 Dashboard 及 Manager 部分对此功能的贡献。具体可参考：INLONG-10330。
+![1.13.0-create-template.png](img%2F1.13.0-create-template.png)
+![1.13.0-select-template.png](img%2F1.13.0-select-template.png)
+![1.13.0-import-template.png](img%2F1.13.0-import-template.png)
+
+### Inlong 支持配置离线同步任务 Pulsar > MySQL
+在1.13.0版本中，InLong 支持了离线同步任务的配置，与实时同步相比，离线数据同步（尚不支持）更注重同步吞吐量和效率。
+该实现统一基于Flink计算引擎。实时同步任务以Flink流任务的方式运行，而离线同步则以Flink批处理任务的方式进行。这种方法可以尽可能地确保实时和离线同步任务代码的一致性，从而降低维护成本。
+InLong 的离线同步功能将与调度系统相结合，将数据源信息的完整或增量数据同步到数据目标，离线同步任务由 InLong Manager 
创建（包括调度信息），具体的数据同步逻辑通过 InLong Sort模块实现。
+![1.13.0-manager-offline.png](img%2F1.13.0-manager-offline.png)
+关键能力:
+- 作业配置：支持向导模式（通过页面向导配置）和OpenAPI模式。
+- 调度配置：支持向导模式（通过页面向导配置）和OpenAPI模式
+- 作业类型：支持定期增量同步和定期完全同步
+- 调度：内置简单的周期性调度功能，第三方调度系统支持任务依赖性等复杂功能。
+- 数据来源：RMDB、消息队列和大数据存储（Hive、StarRocks、Iceberg等）
+- 数据汇聚：RMDB、消息队列和大数据存储（Hive、StarRocks、Iceberg等）
+- 计算引擎：Flink
+- 离线作业操作和维护：作业启动、停止和运行状态监控
+- 特殊处理：脏数据处理能力
+下图为核心流程：
+![1.13.0-dataflow-architecture.png](img%2F1.13.0-dataflow-architecture.png)
+
+### 优化 Sort Standalone 配置流程
+在1.13.0版本中，修改了 Sort Standalone 配置下发流程，在之前的版本中，sort standalone 的配置下发存在以下问题:
+- 配置变更具有不可靠性。配置变化会实时更新到 manager 的缓存中，一旦配置变更，Sort Standalone 
就会感知到，并根据新的配置写入数据，并没有进行对配置的校验。
+- 配置构建流程重复并且繁琐。从数据库中拉取 Sort 配置时是全量拉取并且在拉取完毕后进行实时构建。
+在新的版本中，修改数据目标后，Sort 配置将不会实时生效，而是需要在执行工作流后将 Sort 配置构建写入 sort_config 表中。下图为流程对比:
+![1.13.0-dataflow-architecture.png](img%2F1.13.0-dataflow-architecture.png)

Review Comment:
   Fixed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@inlong.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [INLONG-962][Release] Add blog for the 1.13.0 release [inlong-website]

Reply via email to