(inlong-website) branch master updated: [INLONG-1054][Release] Add blog for the 2.0.0 release (#1080)

aloyszhang Tue, 29 Oct 2024 23:19:02 -0700

This is an automated email from the ASF dual-hosted git repository.

aloyszhang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/inlong-website.git



The following commit(s) were added to refs/heads/master by this push:
     new 7a21ea85409 [INLONG-1054][Release] Add blog for the 2.0.0 release 
(#1080)
7a21ea85409 is described below

commit 7a21ea854099d1d9b8743e7e2893de07d463659a
Author: AloysZhang <lofterzh...@gmail.com>
AuthorDate: Wed Oct 30 14:18:38 2024 +0800

    [INLONG-1054][Release] Add blog for the 2.0.0 release (#1080)
---
 blog/2024-10-20-release-2.0.0.md                   | 292 +++++++++++++++++++++
 blog/img/2.0.0/2.0.0-oceanbase-detail.png          | Bin 0 -> 33531 bytes
 blog/img/2.0.0/2.0.0-oceanbase-target.png          | Bin 0 -> 39440 bytes
 blog/img/2.0.0/2.0.0-oceanbase-type.png            | Bin 0 -> 90738 bytes
 blog/img/2.0.0/2.0.0-offline-schedule-common.png   | Bin 0 -> 23087 bytes
 blog/img/2.0.0/2.0.0-offline-schedule-crontab.png  | Bin 0 -> 19874 bytes
 blog/img/2.0.0/2.0.0-offline-sync-group.png        | Bin 0 -> 29945 bytes
 blog/img/2.0.0/2.0.0-sortstandalone-http.png       | Bin 0 -> 189186 bytes
 blog/img/2.0.0/2.0.0-transform-background.png      | Bin 0 -> 273870 bytes
 .../2024-10-20-release-2.0.0.md                    | 256 ++++++++++++++++++
 .../img/2.0.0/2.0.0-oceanbase-detail.png           | Bin 0 -> 33563 bytes
 .../img/2.0.0/2.0.0-oceanbase-target.png           | Bin 0 -> 38295 bytes
 .../img/2.0.0/2.0.0-oceanbase-type.png             | Bin 0 -> 91285 bytes
 .../img/2.0.0/2.0.0-offline-schedule-common.png    | Bin 0 -> 208429 bytes
 .../img/2.0.0/2.0.0-offline-schedule-crontab.png   | Bin 0 -> 143933 bytes
 .../img/2.0.0/2.0.0-offline-sync-group.png         | Bin 0 -> 258732 bytes
 .../img/2.0.0/2.0.0-sortstandalone-http.png        | Bin 0 -> 189186 bytes
 .../img/2.0.0/2.0.0-transform-background.png       | Bin 0 -> 705029 bytes
 18 files changed, 548 insertions(+)

diff --git a/blog/2024-10-20-release-2.0.0.md b/blog/2024-10-20-release-2.0.0.md
new file mode 100644
index 00000000000..7ed95336b88
--- /dev/null
+++ b/blog/2024-10-20-release-2.0.0.md
@@ -0,0 +1,292 @@
+---
+title: Release 2.0.0
+author: Aloys Zhang
+author_url: https://github.com/aloyszhang
+author_image_url: https://avatars.githubusercontent.com/u/48062889?v=4?s=400
+tags: [Apache InLong, Version]
+---
+
+Apache InLong（应龙）has recently released version 2.0.0, which has closed over 
315 issues, including more than 6 major features and over 96 optimizations. 
+The main accomplishments include support for the Transform SDK and integrated 
into the ES Sink of Sort Standalone, OceanBase data source management,
+adaptive resource configuration for Sort tasks, HTTP output for 
SortStandalone, community documentation restructuring, and full-link support 
for offline synchronization.
+
+After the 2.0.0 release, Apache InLong has added transform capabilities, 
improved support for the Agent Pulsar Source, enriched the capabilities and 
applicable scenarios of Sort, 
+and optimized the display of the InLong Dashboard, addressing various issues 
encountered during the operation and maintenance of InLong.
+
+<!--truncate-->
+
+## About Apache InLong
+As the industry's first one-stop, all-scenario massive data integration 
framework, Apache InLong provides automated, secure, reliable,
+and high-performance data transmission capabilities, enabling businesses to 
quickly build stream-based data analysis, modeling, and applications. 
+Currently, InLong is widely used in various industries including advertising, 
payment, social networking, gaming, and artificial intelligence,
+serving thousands of businesses, with high-performance scenarios processing 
over hundreds of billions of records per day and highly reliable scenarios
+handling over tens of trillions of records per day.
+
+The core keywords for InLong's project positioning are "one-stop," 
"all-scenario," and "massive data." For "one-stop," 
+we aim to shield technical details, provide complete data integration and 
supporting services, and achieve out-of-the-box usability; 
+for "all-scenario," we aim to offer comprehensive solutions covering common 
data integration scenarios in the big data field;
+for "massive data," we hope to leverage architectural advantages such as 
layered data links, fully extensible components, 
+and built-in multi-cluster management to stably support even larger data 
volumes based on hundreds of billions of records per day.
+
+## From 1.0 to 2.0
+In version 1.13.0, InLong added the underlying framework for offline 
synchronization tasks and supported Flink's stream-batch integration capability.
+Version 2.0.0 fixed several issues with the built-in scheduler, connected the 
front and back ends, and enabled the configuration of offline 
+synchronization tasks on the Dashboard page, providing full process support 
for the configuration and management of offline synchronization tasks.
+Based on this capability, users can manage both real-time and offline 
synchronization tasks uniformly.
+
+In historical versions, InLong's standard and lightweight architecture focused 
on data collection, reporting, and data storage in lakes,
+with relatively weak support for fine-grained operations on data. Therefore, 
after version 1.13.0, a very important task for the InLong community
+was to enhance support for Transform capabilities, allowing users to perform 
more flexible processing of data at any stage of data integration.
+
+Transform is based on general SQL semantics and has now completed the 
framework for Transform capabilities, supporting over 180 custom Transform 
functions 
+while ensuring the extensibility of Transform from a design perspective, 
allowing users to flexibly define new Transform capabilities.
+
+In addition to the Transform feature, another focus of the InLong community 
has been the restructuring of community documentation. 
+On one hand, we have filled in missing documentation and updated outdated 
content; 
+on the other hand, we have reorganized the documentation according to user 
guidance, core system introduction, development guidance, and system management,
+providing better explanations from user, developer, and operational 
perspectives. With the restructured documentation,
+users can more easily utilize InLong and better understand its operational and 
management mechanisms, 
+as well as quickly develop custom plugins to meet specific needs.
+
+Overall, by the time version 2.0.0 was released:
+- InLong has achieved full-link support for offline synchronization tasks, 
possessing stream-batch integrated data processing capabilities.
+- InLong has greatly enriched the capabilities of T, laying the foundation for 
future support of ELT/EtLT Pipelines.
+- The optimization of documentation has made InLong more user-friendly, better 
attracting users to understand, use, and co-build InLong.
+
+InLong now supports both standard and lightweight architectures, stream-batch 
integrated data synchronization capabilities, and flexible data Transform 
capabilities. Therefore, the community has decided to upgrade InLong to version 
2.0.0, marking the official entry of Apache InLong into the 2.0 era.
+
+
+## 2.0.0 Overview
+Apache InLong has recently released version 2.0.0, which has closed over 315 
issues, including more than 6 major features and over 96 optimizations, 
achieving the following:
+- Support for Transform SDK and integrated into the ES Sink of Sort Standalone
+- New capabilities for OceanBase data source management
+- Adaptive resource configuration for Sort tasks
+- HTTP output for Sort Standalone
+- Restructured and optimized community documentation
+- Stream-batch integration, support for offline synchronization capabilities
+
+In addition to the above features, version 2.0.0 also:
+- Improved support for the Agent Pulsar Source
+- Enriched the capabilities and usage scenarios of Sort
+- Fixed some issues encountered during InLong operation and maintenance
+- Optimized the display and user experience of the Dashboard
+
+### Agent Module
+- Optimized the Pulsar Source implementation to fix issues with inaccurate 
consumption offsets.
+- Added support for data re-insertion filtering capabilities.
+- Introduced the ability to report Agent status.
+- Updated the implementations for Redis, Oracle, SQLServer, and MQTT data 
sources.
+
+### Dashboard Module
+- Added an offline synchronization configuration page for data synchronization.
+- Optimized the style and structure of data preview.
+- Introduced a heartbeat display page for cluster node management.
+- Added cluster name display to data source information.
+- Supported custom ASCII code options for source data field delimiters.
+- Merged metric items with other items on the module review page.
+- Enabled delete operations for cluster management and template management.
+- Fixed errors in data preview.
+- Added support for OceanBase data sources.
+
+### Manager Module
+- Added support for OceanBase data source management.
+- Introduced TubeMQ configuration capabilities for Sort Standalone.
+- Supported asynchronous installation of Agent and the display of Agent 
installation logs.
+- Enabled configuration for HTTP type Sink.
+- Supported paginated queries for detailed information on sorting tasks.
+- Data preview now supports KV data types, escape characters, and filtering 
Tube data by StreamId.
+- Added data filtering functionality.
+- Permission optimization: Regular users are not allowed to modify Group 
information if they are not the Owner.
+- Fixed issues with offline synchronization updates.
+- Resolved alignment issues in data preview fields.
+
+### SDK Module
+- Transform supports data partitioning using GroupBy semantics.
+- Transform can parse Map nodes in JSON or PB data.
+- Transform JSON data source supports multidimensional arrays.
+- Transform supports ELT functionality.
+- Transform supports configuration and parsing of Transform annotations.
+- Transform supports various data source types: JSON, PB, XML, YAML, BSON, 
AVRO, ORC, PARQUET, etc.
+- Transform supports arithmetic functions: ceil, floor, sin, cos, cot, tanh, 
cosh, asin, atan, mod, etc.
+- Transform supports date and time functions: year, quarter, month, week, 
from_unixtime, unix_timestamp, to_timestamp, etc.
+- Transform supports string functions: substring, replace, reverse, etc.
+- Transform supports common encoding and encryption functions: MD5, ASCII, SHA.
+- Transform supports numeral and bitwise operation functions: HEX, Bitwise.
+- Transform supports compression and decompression functions: GZIP, ZIP, etc.
+- Transform includes other common functions: case conversion, IN, NOT IN, 
EXISTS, etc.
+- DataProxy Java SDK: Shaded Native Library to reduce conflicts with other 
SDKs.
+- DataProxy Java SDK: Optimized sending jitter issue during metadata changes.
+- DataProxy CPP SDK: Improved memory management and optimized build scripts.
+- DataProxy CPP SDK: Supports multiple protocols.
+- DataProxy CPP SDK: Added message manager and optimized data reception 
capabilities.
+- DataProxy CPP SDK: Supports forking subprocesses in DataProxy CPP SDK.
+- DataProxy Python SDK: Updated build scripts and supports skipping the CPP 
SDK build step.
+
+
+### Sort Module
+- Adjust resources required for Sort tasks based on data scale.
+- Added support for OceanBase data source.
+- Flink 1.18 now supports Elasticsearch 6 and Elasticsearch 7 connectors.
+- SortStandalone Elasticsearch Sink supports Transform.
+- SortStandalone supports HTTP Sink and batch sorting.
+- Connector supports OpenTelemetry log reporting.
+- Optimized producer parameters for Kafka connector.
+- Added end-to-end test cases for Flink 1.15.
+
+### Audit Module
+- Supports global memory control for the audit SDK.
+- Optimized daily dimension audit data statistics.
+- Audit SDK allows custom local IP settings.
+- Unified audit aggregation interval range.
+- Resolved Protobuf version conflicts between Audit SDK and other components.
+
+## 2.0.0 Feature Introduction
+
+### New Transform Capabilities
+InLong Transform enhances InLong's ability to expand access and distribution 
capabilities by adapting to a wider range of data protocols and reporting 
scenarios on the input side,
+while accommodating complex and diverse data analysis scenarios on the output 
side. This improves data quality and collaboration, providing computing 
capabilities such as connection, 
+aggregation, filtering, grouping, value extraction, and sampling, all 
decoupled from the computation engine.
+
+It simplifies the pre-processing operations for users reporting data, lowers 
the barriers to data usage, and streamlines the pre-operations required before 
users can start analyzing data. 
+The focus is on the business value of data, achieving the goal of making data 
"visible and usable."
+
+![2.0.0-transform-background.png](img/2.0.0/2.0.0-transform-background.png)
+
+Transform has a wide range of application scenarios. Here are some typical 
examples:
+- Data Cleaning: During the data integration process, Transform capabilities 
can effectively eliminate errors, duplicates, and inconsistencies in data from 
different sources, improving data quality.
+- Data Fusion: Combining data from different sources for unified analysis and 
reporting. Transform capabilities can handle various formats and structures of 
data, achieving data fusion and integration.
+- Data Standardization: Converting data into a unified standard format for 
cross-system and cross-platform data analysis. Transform capabilities help 
enterprises achieve data standardization and normalization.
+- Data Partitioning and Indexing: To enhance the performance of data queries 
and analysis, Transform capabilities can dynamically adjust field values for 
data partitioning and indexing, thereby improving data warehouse performance.
+- Data Aggregation and Calculation: In the data analysis process, Transform 
capabilities can perform complex data aggregation and calculations to extract 
valuable business information, covering multidimensional data analysis.
+
+Main Features of Transform：
+- Support for Rich Data Protocols: Enables integration with a variety of data 
protocols.
+- Decoupled from Computing Engine: Allows flexibility in processing without 
being tied to a specific computing engine.
+- Support for Rich Transformation Functions: Provides a wide range of 
functions for data transformation.
+- Lossless and Transparent Changes: Ensures that changes can be made without 
data loss or noticeable impact.
+- Automatic Scaling: Supports dynamic scaling up and down based on workload.
+
+Currently, Transform supports a variety of data formats and custom functions, 
allowing users to flexibly process data using SQL. 
+Special thanks to contributors such as @luchunliang, @vernedeng, @emptyOVO, 
@ying-hua, @Zkplo, @MOONSakura0614, and @Ybszzzziz for their efforts. 
+For more details, please refer to [Transform SDK 
Issues](https://github.com/aloyszhang/inlong/blob/master/CHANGES.md#sdk).
+
+### Community Documentation Restructuring
+With the continuous development of the InLong community, the capabilities of 
InLong are also constantly enhancing. However, there have been issues with 
missing or outdated community documentation. 
+To address this, the InLong community has initiated a restructuring of the 
community documentation to better assist users in understanding and utilizing 
InLong.
+
+Main Content Includes:
+  - Optimized Document Structure: Better organization of document content. 
+  - Enhanced Quick Start Examples:
+    - Offline synchronization usage examples
+    - Transform SDK usage examples
+    - Data subscription usage examples
+    - HTTP message reporting usage examples
+  - Improved SDK Documentation:
+    - DataProxy: C++, Java, Golang, Python SDKs, and HTTP data reporting 
manuals
+    - TubeMQ SDK: C++, Java, Golang SDK usage manuals
+  - Enhanced Development Guidelines:
+    - Code compilation guidelines
+    - Documentation for data protocols of each component
+    - Documentation for extension development of each component
+    - REST API documentation
+  - Improved Management Articles: Documentation on user management, approval 
management, tenant management, node management, cluster management, tag 
management, template management, and agent management.
+
+
+Currently, the community documentation has seen significant improvements in 
usage guidelines, development guidelines, and management guidelines. 
+Special thanks to contributors such as @aloyszhang, @fuweng11, @vernedeng, 
@luchunliang, @gosonzhang, @doleyzi, @baomingyu, @justinwwhuang, and 
@wohainilaodou for their contributions to the documentation improvement.
+For more details, please refer to the [official 
website](https://inlong.apache.org/docs/next/introduction).
+
+### Support for OceanBase Data Source
+OceanBase Database is a distributed relational database characterized by high 
availability and scalability, suitable for large-scale data storage and 
processing scenarios. InLong version 2.0.0 adds support for OceanBase data 
sources, allowing data to be imported from data sources into OceanBase.
+
+Managing OceanBase data nodes is similar to MySQL. To create a new OceanBase 
node, you need to fill in the node name, type (OceanBase), username, password, 
address, and other key information.
+
+![2.0.0-oceanbase-detail.png](img/2.0.0/2.0.0-oceanbase-detail.png)
+
+To write data into OceanBase, you first need to create a data target of type 
`OceanBase`.
+
+![2.0.0-oceanbase-type.png](img/2.0.0/2.0.0-oceanbase-type.png)
+
+Then fill in the relevant information, including the name, data node 
information, database and table names of the data target, and the primary key 
information of the target table.
+
+![2.0.0-oceanbase-target.png](img/2.0.0/2.0.0-oceanbase-target.png)
+
+Thanks to @xxsc0529 for their contributions to this feature. For more details, 
please refer to [INLONG-10700](https://github.com/apache/inlong/pull/10700), 
[INLONG-10701](https://github.com/apache/inlong/pull/10701), 
[INLONG-10704](https://github.com/apache/inlong/pull/10704).
+
+### Dynamic Resource Calculation for Sort Tasks
+The total resources (task parallelism) for Flink Sort Jobs come from the 
configuration file `flink-sort-plugin.properties`, meaning that all submitted 
sorting jobs will use the same amount of resources. When the data scale is 
large, resources may be insufficient, and when the data scale is small, 
resources may be wasted.
+
+Therefore, dynamically calculating the required resource quantity based on 
data volume is a much-needed feature. InLong now supports dynamically 
calculating the total resource requirements for tasks based on data volume, 
involving two core pieces of data:
+
+- The data volume of the task: This relies on the audit system and is derived 
from the average data volume recorded by `DataProxy` over the past hour.
+- The processing capacity per core: This depends on the maximum number of 
messages per core configured in the `flink-sort-plugin.properties` file.
+
+With these two pieces of data, the total resource requirements for a task can 
be calculated. This feature supports a switch to enable or disable it as needed.
+
+Thanks to @PeterZh6 for their contributions to this feature. For more details, 
please refer to [INLONG-10916](https://github.com/apache/inlong/pull/10916).
+
+### SortStandalone Supports HTTP Sink
+Inlong SortStandalone is responsible for consuming data from MQ and 
distributing it to various data storage modules, supporting multiple data 
stores such as ElasticSearch and CLS.
+
+Compared to SortFlink, SortStandalone offers higher performance and lower 
latency, making it suitable for scenarios with high performance requirements.
+
+HTTP is a widely used communication protocol, and SortStandalone supports HTTP 
output, allowing data to be sent to HTTP interfaces without worrying about 
specific storage implementations, thus adapting more flexibly to different 
business scenarios.
+
+The processing flow for HTTP output is as follows:
+
+![2.0.0-sortstandalone-http.png](img/2.0.0/2.0.0-sortstandalone-http.png)
+
+HTTP output has the following features:
+
+- SortSDK is responsible for consuming data from MQ
+- Supports semaphore-based traffic control capabilities
+- Metadata management relies on Manager, supporting dynamic updates
+- The output protocol is HTTP, decoupling specific storage implementations
+- Supports retry strategies
+
+Thanks to @yfsn666 and @fuweng11 for their contributions to this feature. For 
more details, please refer to 
[INLONG-10831](https://github.com/apache/inlong/pull/10831) and 
[INLONG-10884](https://github.com/apache/inlong/pull/10884).
+
+### Full-Link Management of Offline Synchronization
+
+Version 2.0.0 adds full-link management capabilities for offline data 
synchronization tasks, with configuration methods for offline synchronization 
tasks similar to real-time synchronization. The specific process is as follows:
+
+First, create the Group information for the synchronization task.
+
+![2.0.0-offline-sync-group.png](img/2.0.0/2.0.0-offline-sync-group.png)
+
+Note that the "Sync Type" should be selected as "Offline."
+
+The second step is to configure the scheduling information for the offline 
task.
+
+![2.0.0-offline-schedule-common.png](img/2.0.0/2.0.0-offline-schedule-common.png)
+
+The conventional scheduling configuration requires setting the following 
parameters:
+- Scheduling unit: Supports minute, hour, day, month, year, and single 
execution (single execution means it will only run once)
+- Scheduling cycle: Represents the time interval between two task schedules
+- Delay time: Represents the delay time for task startup
+- Valid time: Includes start and end times; scheduled tasks will only run 
within this time range
+
+In addition to conventional scheduling methods, Crontab configuration is also 
supported.
+
+![2.0.0-offline-schedule-crontab.png](img/2.0.0/2.0.0-offline-schedule-crontab.png)
+
+Crontab scheduling requires setting the following parameters:
+- Valid time: Includes start and end times; scheduled tasks will only run 
within this time range
+- Crontab expression: Represents the task cycle, for example, `0 */5 * * * ?`
+
+The third step is to create a Stream and configure data source and target 
information, which is consistent with real-time synchronization and will not be 
repeated here. 
+For more details, please refer to [Offline Synchronization 
Pulsar->MySQL](https://inlong.apache.org/docs/next/quick_start/offline_data_sync/pulsar_mysql_example).
+
+## Summary and Future Plans
+Version 2.0.0 is the first version of the 2.x series, and the technical 
capability framework has been basically established. We welcome everyone to use 
it.
+If you have more scenarios and requirements, or encounter issues during use, 
please feel free to raise issues and PRs.
+In future versions, the InLong community will continue to:
+- Support more data source collection capabilities.
+- Enrich Flink 1.15 and 1.18 connectors.
+- Continuously enhance Transform capabilities.
+- Provide real-time synchronization support for more data sources and targets.
+- Advance offline integration, supporting third-party scheduling engines.
+- Optimize SDK capabilities and user experience.
+- Improve Dashboard experience.
+
+We also look forward to more developers interested in InLong to contribute and 
help drive the project's development!
\ No newline at end of file
diff --git a/blog/img/2.0.0/2.0.0-oceanbase-detail.png 
b/blog/img/2.0.0/2.0.0-oceanbase-detail.png
new file mode 100644
index 00000000000..71c2e11bea9
Binary files /dev/null and b/blog/img/2.0.0/2.0.0-oceanbase-detail.png differ
diff --git a/blog/img/2.0.0/2.0.0-oceanbase-target.png 
b/blog/img/2.0.0/2.0.0-oceanbase-target.png
new file mode 100644
index 00000000000..4187626ef0d
Binary files /dev/null and b/blog/img/2.0.0/2.0.0-oceanbase-target.png differ
diff --git a/blog/img/2.0.0/2.0.0-oceanbase-type.png 
b/blog/img/2.0.0/2.0.0-oceanbase-type.png
new file mode 100644
index 00000000000..07228cdaadc
Binary files /dev/null and b/blog/img/2.0.0/2.0.0-oceanbase-type.png differ
diff --git a/blog/img/2.0.0/2.0.0-offline-schedule-common.png 
b/blog/img/2.0.0/2.0.0-offline-schedule-common.png
new file mode 100644
index 00000000000..597e43f862f
Binary files /dev/null and b/blog/img/2.0.0/2.0.0-offline-schedule-common.png 
differ
diff --git a/blog/img/2.0.0/2.0.0-offline-schedule-crontab.png 
b/blog/img/2.0.0/2.0.0-offline-schedule-crontab.png
new file mode 100644
index 00000000000..03faa33286c
Binary files /dev/null and b/blog/img/2.0.0/2.0.0-offline-schedule-crontab.png 
differ
diff --git a/blog/img/2.0.0/2.0.0-offline-sync-group.png 
b/blog/img/2.0.0/2.0.0-offline-sync-group.png
new file mode 100644
index 00000000000..db56ad50434
Binary files /dev/null and b/blog/img/2.0.0/2.0.0-offline-sync-group.png differ
diff --git a/blog/img/2.0.0/2.0.0-sortstandalone-http.png 
b/blog/img/2.0.0/2.0.0-sortstandalone-http.png
new file mode 100644
index 00000000000..71a83b30bac
Binary files /dev/null and b/blog/img/2.0.0/2.0.0-sortstandalone-http.png differ
diff --git a/blog/img/2.0.0/2.0.0-transform-background.png 
b/blog/img/2.0.0/2.0.0-transform-background.png
new file mode 100644
index 00000000000..b998cc071e9
Binary files /dev/null and b/blog/img/2.0.0/2.0.0-transform-background.png 
differ
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-blog/2024-10-20-release-2.0.0.md 
b/i18n/zh-CN/docusaurus-plugin-content-blog/2024-10-20-release-2.0.0.md
new file mode 100644
index 00000000000..fadd81b22b2
--- /dev/null
+++ b/i18n/zh-CN/docusaurus-plugin-content-blog/2024-10-20-release-2.0.0.md
@@ -0,0 +1,256 @@
+---
+title: 2.0.0 版本发布
+author: Aloys Zhang
+author_url: https://github.com/aloyszhang
+author_image_url: https://avatars.githubusercontent.com/u/48062889?v=4?s=400
+tags: [Apache InLong, Version]
+---
+
+Apache InLong（应龙）最近发布了 2.0.0 版本，该版本关闭了 315+ 个 Issues ，包括 6+ 个大特性和 96+ 
个优化，主要完成了：支持 Transform SDK 并且集成到 Sort Standalone 的 ES Sink、 OceanBase 
数据源管理、Sort 资源自适应配置、SortStandalone HTTP 接出、社区文档重构、全链路支持了离线同步等特性。
+2.0.0 发布后，Apache InLong 新增了 transform 能力， 完善了Agent Pulsar Source 的支持，丰富了 Sort 
的能力和适用场景，同时优化了 InLong Dashboard 的展示，以及 InLong 运营、运维过程中遇到的一些问题和使用体验。
+<!--truncate-->
+
+## 关于 Apache InLong
+作为业界首个一站式、全场景海量数据集成框架，Apache 
InLong（应龙）提供了自动、安全、可靠和高性能的数据传输能力，方便业务快速构建基于流式的数据分析、建模和应用。目前 InLong 
正广泛应用于广告、支付、社交、游戏、人工智能等各个行业领域，服务上千个业务，其中高性能场景数据规模超百万亿条/天，高可靠场景数据规模超十万亿条/天。
+
+InLong 
项目定位的核心关键词是“一站式”、“全场景”和“海量数据”。对于“一站式”，我们希望屏蔽技术细节、提供完整数据集成及配套服务，实现开箱即用；对于“全场景”，我们希望提供全方位的解决方案，覆盖大数据领域常见的数据集成场景；对于“海量数据”，我们希望通过架构上的数据链路分层、全组件可扩展、自带多集群管理等优势，在百万亿条/天的基础上，稳定支持更大规模的数据量。
+
+## 从 1.0 到 2.0
+在 1.13.0 版本中，InLong 增加了离线同步任务底层框架的搭建，后台支持了 Flink 流批一体能力，2.0.0 
版本修复了内置调度器的一些问题，打通了前后端，支持在 Dashboard 
页面中配置离线同步任务，全流程支持了离线同步任务的配置和管理，基于此项能力，用户可以统一对实时和离线同步任务进行管理。
+
+在历史版本中，InLong 的标准和轻量化架构都把能力聚焦在数据采集、上报和数据入库入湖上，对于数据的一些细粒度操作支持较弱，因此，在 1.13.0 
版本之后，InLong 社区一项非常重要的工作就是增加对 Transform 支持，通过 Transform 
的能力，用户可以在数据集成的任何阶段对数据做更多灵活的处理。
+
+Transfrom 基于通用的 SQL 语义实现，目前已经完成了 Transfrom 能力框架的搭建，支持了 180+ Transform 
自定义函数，并且从设计上保证了 Transform 的扩展性，用户可以灵活自定义新的 Transform 能力。
+
+除了 Transform 特性之外，InLong 
社区的另一项重点是对社区文档进行了重构，一方面补齐了缺失的文档并且更新了过时的内容，另一方面按照用户指引、系统核心介绍、开发指引、系统管理等维度重新组织了文档，对整个系统从用户视角、开发视角和运维视角提供更好的阐释。基于重构后的文档，用户可以更加方便的使用
 InLong、更好的理解 InLong 的运行和运营管理机制，也可以快速的进行自定义插件的开发来满足一些定制化需求。
+
+总的来说，截至 2.0.0 版本发布：
+- InLong 完成了对离线同步任务的全链路支持，具备了流批一体的数据处理能力
+- InLong 极大的丰富了 T 的能力，也为以后支持 ELT/EtLT Pipeline 等奠定了基础
+- 文档的优化让 InLong 更加的用户友好，可以更好的吸引用户来了解、使用和共建 InLong
+
+InLong 目前已经支持了标准和轻量化两种架构、流批一体的数据同步能力、灵活的数据 Transfrom 能力，因此社区决定将 InLong 的版本升级到 
2.0.0，Apache InLong 也正式进入 2.0 时代。
+
+## 2.0.0 版本总览
+Apache InLong（应龙）最近发布了 2.0.0 版本，该版本关闭了 315+ 个 Issues ，包括 6+ 个大特性和 96+ 个优化，完成了：
+- 支持 Transform SDK 并且集成到 Sort Standalone 的 ES Sink
+- 新增对 OceanBase 数据源管理能力 
+- Sort 任务资源自适应配置 
+- Sort Standalone HTTP 接出 
+- 重构、优化了 InLong 社区文档 
+- 流批一体，全链路支持离线同步能力
+
+除此上述特性之外，2.0.0 版本还： 
+- 完善了 Agent Pulsar Source 的支持 
+- 丰富了 Sort 的能力和使用场景 
+- 修复了 InLong 运营、运维过程中遇到的一些问题 
+- 优化了 Dashboard 的展示和使用体验
+
+### Agent 模块
+- 优化 Pulsar Source 实现，修复消费位点不准确问题
+- 支持数据补录过滤能力
+- 支持 Agent 状态上报能力
+- 更新 Redis、Oracle、SQLServer、MQTT 数据源实现
+
+### Dashboard 模块
+- 数据同步增加离线同步配置页面
+- 优化数据预览样式结构
+- 集群节点管理增加心跳显示页面
+- 数据源信息显示添加集群名称
+- 源数据字段分隔符支持自定义 ASCII 代码选项
+- 模块审核页面指标项与其他项合并
+- 集群管理和模板管理支持删除操作
+- 修复数据预览错误
+- 支持 OceanBase 数据源
+
+### Manager 模块
+- 支持 OceanBase 数据源管理
+- 增加 Sort Standalone 的 TubeMQ 配置能力
+- 支持 Agent 异步安装、支持 Agent 安装日志展示
+- 支持配置 HTTP 类型 Sink
+- 支持分页查询排序任务详细信息
+- 数据预览支持 KV 数据类型、支持转义字符、支持根据 StreamId 过滤 Tube 数据
+- 支持数据过滤功能
+- 权限优化：普通用户不是 Owner 时不允许修改 Group 信息
+- 修复离线同步更新异常
+- 修复数据预览字段未对齐问题
+
+### SDK 模块
+- Transform 支持使用 GroupBy 语义进行数据分片
+- Transform 支持解析 JSON 或 PB 数据中的 Map 节点
+- Transform JSON 数据源支持多维数组
+- Transform 支持 ELT 功能
+- Transform 支持 Transform 注解配置以及解析
+- Transform 支持多种类型数据源：JSON、PB、XML、YAML、BSON、AVRO、ORC、PARQUET 等
+- Transform 支持算术函数：ceil、floor、sin、cos、cot、tanh、cosh、asin、ata、mod 等
+- Transform 支持日期和时间函数：year、 quarter、 month、 week、 
form_unixtime、unix_timestamp、 to_timestamp 等
+- Transform 支持字符串函数：substring、replace、 reverse 等
+- Transform 支持常用编码以及加密函数：MD5、ASCII、SHA
+- Transform 支持进制和位运算函数：HEX、Bitwise
+- Transform 支持压缩和解压缩函数: GZIP、ZIP 等
+- Transform 其他常用函数：大小写转换、IN、NOT IN、EXISTS 等
+- DataProxy Java SDK: Shaded Native Library 以减少与其他 sdk 的冲突
+- DataProxy Java SDK: 优化元数据变更时的发送抖动问题
+- DataProxy CPP SDK: 优化内存管理、优化编译脚本
+- DataProxy CPP SDK: 支持多种协议
+- DataProxy CPP SDK: 添加消息管理器、优化接收数据的能力
+- DataProxy CPP SDK: 支持 DataProxy CPP SDK 的 fork 子进程
+- DataProxy Python SDK: 更新构建脚本，并且支持跳过 CPP SDK 构建步骤
+
+
+### Sort 模块
+- 根据数据规模调整 Sort 任务需要的资源
+- 支持 OceanBase 数据源
+- Flink 1.18 中支持 Elasticsearch6 和 Elasticsearch7 connector
+- SortStandalone Elasticsearch Sink 支持 Transform
+- SortStandalone 支持 HTTP Sink，并且支持批量排序
+- Connector 支持 OpenTelemetry 日志上报
+- 优化 Kafka connector 生产者参数
+- 增加 Flink 1.15 端到端测试用例
+
+### Audit 模块
+- 支持审核 SDK 全局内存控制
+- 优化天维度审计数据统计
+- 审计 SDK 支持自定义设置本地 IP
+- 统一审计聚合间隔范围
+- 解决 Audit SDK 与其他组件 Protobuf 版本冲突
+
+## 2.0.0 版本特性介绍
+
+### 新增 Transform 能力
+InLong Transform 助力 InLong 扩展接入分发能力，接入侧适配更丰富的数据协议和上报场景，分发侧适配复杂多样的数据分析场景，
+提高数据质量和数据协作，提供连接、聚合、筛选、分组、取值、抽样等和计算引擎解耦的计算能力，简化用户上报数据的前置操作，降低数据使用门槛，
+简化用户开始分析数据前的前置操作，聚焦数据的业务价值，实现数据“可见即可用”。
+
+![2.0.0-transform-background.png](img/2.0.0/2.0.0-transform-background.png)
+
+Transform 具有非常广泛的应用场景，以下是一些典型的应用场景：
+- 数据清洗：在数据集成过程中，需要对来自不同源的数据进行清洗，以消除数据中的错误、重复和不一致。Transform 
能力可以帮助企业更有效地进行数据清洗，提高数据质量
+- 数据融合：将来自不同数据源的数据融合在一起，以便进行统一的分析和报告。Transform 能力可以处理不同格式和结构的数据，实现数据的融合和集成
+- 数据标准化：将数据转换为统一的标准格式，以便进行跨系统和跨平台的数据分析。Transform 能力可以帮助企业实现数据的标准化和规范化
+- 数据分区和索引：为了提高数据查询和分析的性能，对数据进行分区和建立索引。Transform 能力可以实现分区和索引的字段值动态调整，从而提高数据仓库的性能
+- 数据聚合和计算：在数据分析过程中，通过对数据进行聚合和计算，提取有价值的业务信息。Transform 
能力可以实现复杂的数据聚合和计算，覆盖多维度的数据分析
+- 数据安全和隐私保护：在数据集成过程中，需要确保数据的安全和隐私。Transform 能力可以实现数据的脱敏、加密和授权管理，保护数据的安全和隐私
+- 跨团队数据共享：出于数据安全考虑，只共享数据流的筛选子集；出于数据依赖解耦考虑，和合作团队约定数据接口，动态调整多流合并到数据流接口
+
+Transform 的主要特性如下：
+- 支持丰富的数据协议
+- 计算引擎解耦
+- 支持丰富的转换函数
+- 支持无损无感变更
+- 支持自动扩缩容
+
+目前 Transform 已经在支持了丰富的数据格式以及丰富的自定义函数，用户可以通过 SQL 的方式灵活的对数据进行处理，感谢 @luchunliang, 
@vernedeng, @emptyOVO, @ying-hua, @Zkplo, @MOONSakura0614, @Ybszzzziz 等同学的贡献，
+详情可以参考 [Transform SDK 
Issues](https://github.com/aloyszhang/inlong/blob/master/CHANGES.md#sdk)。
+
+### 社区文档重构
+随着 InLong 社区的不断发展，InLong 能力也在不断增强，社区文档存在缺失或者更新不及时的问题，针对于这个问题， InLong 
社区发起了对于社区文档的重构，以便更好地帮助用户了解和使用 InLong。
+
+主要的内容包括：
+- 优化文档结构，更好地组织文档内容
+- 快速开始使用示例完善：
+  - 离线同步使用示例
+  - Transform SDK 使用示例
+  - 数据订阅使用示例
+  - HTTP 消息上报使用示例
+- SDK 使用文档完善
+  - DataProxy：C++、Java、Golang、Python SDK 以及 HTTP 数据上报使用手册
+  - TubeMQ SDK：C++、Java、Golang SDK 使用手册
+- 开发指引完善
+  - 代码编译指引
+  - 各组件数据协议文档
+  - 各组件扩展开发文档
+  - REST API
+- 管理文章完善：用户完善、审批管理、租户管理、节点管理、集群管理、标签管理、模版管理、agent 管理文档
+
+目前社区文档在使用指引、开发指引、管理指引等方面都有了较大的提升，感谢 @aloyszhang, @fuweng11, @vernedeng, 
@luchunliang, @gosonzhang, @doleyzi, @baomingyu, @justinwwhuang, @wohainilaodou 
等同学对文档重构的贡献，详情请参考[官网](https://inlong.apache.org/docs/next/introduction)。
+
+### 新增 OceanBase 数据源
+OceanBase Database 是一个分布式关系型数据库，具有高可用、高扩展性等特点，适用于大规模数据存储和处理场景， InLong 2.0.0 
版本增加了对 OceanBase 数据源的支持，可以将数据从数据源导入到 OceanBase 中。
+OceanBase 数据节点的管理和 MySQL 类似，新建 OceanBase 
节点需要填写节点名称、类型（OceanBase）、用户名、密码、地址等关键信息。
+
+![2.0.0-oceanbase-detail.png](img/2.0.0/2.0.0-oceanbase-detail.png)
+
+将数据写入到 OceanBase，首先需要创建 `OceanBase` 类型的数据目标，
+
+![2.0.0-oceanbase-type.png](img/2.0.0/2.0.0-oceanbase-type.png)
+
+然后填写相关信息，包括：名称，数据节点信息，数据目标的库、表名称，以及目标表主键信息。
+
+![2.0.0-oceanbase-target.png](img/2.0.0/2.0.0-oceanbase-target.png)
+
+感谢 @xxsc0529 对此功能的贡献，具体请参考 
[INLONG-10700](https://github.com/apache/inlong/pull/10700)、[INLONG-10701](https://github.com/apache/inlong/pull/10701)、[INLONG-10704](https://github.com/apache/inlong/pull/10704)。
+
+### 支持动态计算 Sort 任务的资源
+Flink Sort Job 的资源总量（任务并行度）来自于配置文件 
`flink-sort-plugin.properties`，这意味着所有提交的排序作业都会使用相同数量的资源。
+当数据规模大时，资源可能不足，当数据规模小时，资源可能浪费。
+
+因此，根据数据量动态计算资源数量是一个非常需要的功能。 InLong 现在支持根据数据量动态计算任务所需要的资源总量，涉及两个核心数据：
+- 任务的数据量：数据量依赖于审计系统，取自审计系统统计到的 `DataProxy` 过去一小时平均数据量
+- 单核的处理能力：单核的处理能力依赖于配置文件 `flink-sort-plugin.properties` 中配置的一个核的最大消息数
+
+有了这两个数据，就可以计算出一个任务所需要的资源总量。 该功能支持开关，可以根据需要选择打开或关闭。
+
+感谢 @PeterZh6 对此功能的贡献，具体请参考 
[INLONG-10916](https://github.com/apache/inlong/pull/10916)。
+
+### SortStandalone 支持  HTTP 接出
+Inlong SortStandalone 负责从 MQ 消费数据，然后分发到不同数据存储的模块，支持 ElasticSearch、CLS 等多种数据存储。
+
+相比于 SortFlink, SortStandalone 具有更高的性能和更低的延迟，适用于对性能要求较高的场景。
+
+HTTP 协议是被广泛使用的一种通信协议，SortStandalone 支持 HTTP 接出，可以将数据发送到 HTTP 
接口，而不需要关心具体的存储实现，可以更加灵活的适应不同的业务场景。
+
+HTTP 接出的处理流程如下：
+
+![2.0.0-sortstandalone-http.png](img/2.0.0/2.0.0-sortstandalone-http.png)
+
+HTTP 接出具备以下特点：
+- SortSDK 负责从 MQ 消费数据
+- 支持基于信号量的流量控制能力
+- 元数据管理依赖 Manager，支持动态更新
+- 接出协议为 HTTP，解耦具体存储实现
+- 支持重试策略
+
+感谢 @yfsn666 和 @fuweng11 对此功能的贡献，具体请参考 
[INLONG-10831](https://github.com/apache/inlong/pull/10831) 和 
[INLONG-10884](https://github.com/apache/inlong/pull/10884)。
+
+### 离线同步的全链路管理
+2.0.0 版本增加了离线数据同步任务的全链路管理能力，配置离线同步任务的方式和实时同步类似，具体流程如下。
+首先创建同步任务的 Group 信息，
+
+![2.0.0-offline-sync-group.png](img/2.0.0/2.0.0-offline-sync-group.png)
+
+这里注意“同步类型”选择为“离线”。
+
+第二步，配置离线任务的调度信息，
+
+![2.0.0-offline-schedule-common.png](img/2.0.0/2.0.0-offline-schedule-common.png)
+
+常规调度配置需要设置以下参数： 
+- 调度单位：支持分钟、小时、天、月、年以及单次，单次表示只执行一次 
+- 调度周期：表示两次任务调度之间的时间间隔 
+- 延迟时间：表示任务启动的延迟时间 
+- 有效时间：包括起始时间和结束时间，调度任务只会在这个时间范围内执行
+
+除了常规调度方式，还支持 Crontab 的配置方式，
+
+![2.0.0-offline-schedule-crontab.png](img/2.0.0/2.0.0-offline-schedule-crontab.png)
+
+Crontab调度需要设置以下参数： 
+- 有效时间：包括起始时间和结束时间，调度任务只会在这个时间范围内执行 
+- crontab 表达式：表示任务的周期，比如 `0 */5 * * * ?`
+
+第三步，创建 Stream 并且配置数据源和数据目标信息，过程和实时同步一致，不再赘述，详情可以参考 [离线同步 
Pulsar->MySQL](https://inlong.apache.org/zh-CN/docs/next/quick_start/offline_data_sync/pulsar_mysql_example)
 。
+
+感谢 @wohainilaodou 的贡献，详情参考 
[INLONG-10779](https://github.com/apache/inlong/issues/10779)。
+
+## 总结与未来规划
+2.0.0 版本是 2.x 第一个版本，技术能力框架基本搭建完成，欢迎大家使用，如果有更多场景和需求，或者使用期间遇到的问题， 欢迎大家提 issue和 
PR。在后续的版本中，InLong 社区将继续：
+- 支持更多数据源采集能力
+- 丰富 Flink 1.15、1.18 Connector 
+- 持续丰富 Transform 能力
+- 实时同步支持更多数据源、数据目标
+- 推进离线集成，支持第三方调度引擎
+- 优化 SDK 能力和使用体验
+- 优化 Dashboard 体验
+
+我们也期待更多对 InLong 感兴趣的开发者可以参与贡献。
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-blog/img/2.0.0/2.0.0-oceanbase-detail.png
 
b/i18n/zh-CN/docusaurus-plugin-content-blog/img/2.0.0/2.0.0-oceanbase-detail.png
new file mode 100644
index 00000000000..0b230803177
Binary files /dev/null and 
b/i18n/zh-CN/docusaurus-plugin-content-blog/img/2.0.0/2.0.0-oceanbase-detail.png
 differ
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-blog/img/2.0.0/2.0.0-oceanbase-target.png
 
b/i18n/zh-CN/docusaurus-plugin-content-blog/img/2.0.0/2.0.0-oceanbase-target.png
new file mode 100644
index 00000000000..73c0c69f9a6
Binary files /dev/null and 
b/i18n/zh-CN/docusaurus-plugin-content-blog/img/2.0.0/2.0.0-oceanbase-target.png
 differ
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-blog/img/2.0.0/2.0.0-oceanbase-type.png 
b/i18n/zh-CN/docusaurus-plugin-content-blog/img/2.0.0/2.0.0-oceanbase-type.png
new file mode 100644
index 00000000000..a86fbf9190b
Binary files /dev/null and 
b/i18n/zh-CN/docusaurus-plugin-content-blog/img/2.0.0/2.0.0-oceanbase-type.png 
differ
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-blog/img/2.0.0/2.0.0-offline-schedule-common.png
 
b/i18n/zh-CN/docusaurus-plugin-content-blog/img/2.0.0/2.0.0-offline-schedule-common.png
new file mode 100644
index 00000000000..49721ebce06
Binary files /dev/null and 
b/i18n/zh-CN/docusaurus-plugin-content-blog/img/2.0.0/2.0.0-offline-schedule-common.png
 differ
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-blog/img/2.0.0/2.0.0-offline-schedule-crontab.png
 
b/i18n/zh-CN/docusaurus-plugin-content-blog/img/2.0.0/2.0.0-offline-schedule-crontab.png
new file mode 100644
index 00000000000..065bd062f71
Binary files /dev/null and 
b/i18n/zh-CN/docusaurus-plugin-content-blog/img/2.0.0/2.0.0-offline-schedule-crontab.png
 differ
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-blog/img/2.0.0/2.0.0-offline-sync-group.png
 
b/i18n/zh-CN/docusaurus-plugin-content-blog/img/2.0.0/2.0.0-offline-sync-group.png
new file mode 100644
index 00000000000..475b2cabcd7
Binary files /dev/null and 
b/i18n/zh-CN/docusaurus-plugin-content-blog/img/2.0.0/2.0.0-offline-sync-group.png
 differ
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-blog/img/2.0.0/2.0.0-sortstandalone-http.png
 
b/i18n/zh-CN/docusaurus-plugin-content-blog/img/2.0.0/2.0.0-sortstandalone-http.png
new file mode 100644
index 00000000000..71a83b30bac
Binary files /dev/null and 
b/i18n/zh-CN/docusaurus-plugin-content-blog/img/2.0.0/2.0.0-sortstandalone-http.png
 differ
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-blog/img/2.0.0/2.0.0-transform-background.png
 
b/i18n/zh-CN/docusaurus-plugin-content-blog/img/2.0.0/2.0.0-transform-background.png
new file mode 100644
index 00000000000..1d13a4c0adc
Binary files /dev/null and 
b/i18n/zh-CN/docusaurus-plugin-content-blog/img/2.0.0/2.0.0-transform-background.png
 differ

(inlong-website) branch master updated: [INLONG-1054][Release] Add blog for the 2.0.0 release (#1080)

Reply via email to