(doris-website) branch master updated: update log storage docs (#1004)

luzhijing Wed, 14 Aug 2024 22:58:01 -0700

This is an automated email from the ASF dual-hosted git repository.

luzhijing pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git



The following commit(s) were added to refs/heads/master by this push:
     new 754daf65a95 update log storage docs (#1004)
754daf65a95 is described below

commit 754daf65a95912a0a5e169dcf183cc24e8eb340d
Author: Kang <kxiao.ti...@gmail.com>
AuthorDate: Thu Aug 15 13:57:49 2024 +0800

    update log storage docs (#1004)
    
    1. config: enable_compaction_priority_scheduling,
    total_permits_for_compaction_score, inverted_index_ram_dir_enable
    2. remove misused \
---
 .../tutorials/log-storage-analysis.md              | 114 ++++++------
 gettingStarted/tutorials/log-storage-analysis.md   | 196 ++++++++++----------
 .../practical-guide/log-storage-analysis.md        | 204 +++++++++++----------
 .../practical-guide/log-storage-analysis.md        | 202 ++++++++++----------
 .../practical-guide/log-storage-analysis.md        | 198 ++++++++++----------
 5 files changed, 472 insertions(+), 442 deletions(-)

diff --git a/common_docs_zh/gettingStarted/tutorials/log-storage-analysis.md 
b/common_docs_zh/gettingStarted/tutorials/log-storage-analysis.md
index 7ee35679f3b..1a579033043 100644
--- a/common_docs_zh/gettingStarted/tutorials/log-storage-analysis.md
+++ b/common_docs_zh/gettingStarted/tutorials/log-storage-analysis.md
@@ -170,7 +170,7 @@ Apache Doris 对 Flexible Schema 的日志数据提供了几个方面的支持
 | 需调整参数                                                   | 说明                 
                                        |
 | :----------------------------------------------------------- | 
:----------------------------------------------------------- |
 | `max_running_txn_num_per_db = 10000`                         | 
高并发导入运行事务数较多，需调高参数。                       |
-| `streaming_label_keep_max_second = 3600``label_keep_max_second = 7200` | 
高频导入事务标签内存占用多，保留时间调短。                   |
+| `streaming_label_keep_max_second = 3600` `label_keep_max_second = 7200` | 
高频导入事务标签内存占用多，保留时间调短。                   |
 | `enable_round_robin_create_tablet = true`                    | 创建 Tablet 
时，采用 Round Robin 策略，尽量均匀。            |
 | `tablet_rebalancer_type = partition`                         | 均衡 Tablet 
时，采用每个分区内尽量均匀的策略。               |
 | `enable_single_replica_load = true`                          | 
开启单副本导入，多个副本只需构建一次索引，减少 CPU 消耗。    |
@@ -194,10 +194,13 @@ Apache Doris 对 Flexible Schema 的日志数据提供了几个方面的支持
 | Compaction | `max_cumu_compaction_threads = 8`                            | 
设置为 CPU 核数 / 4，意味着 CPU 资源的 1/4 用于写入，1/4 用于后台 Compaction，2/1 留给查询和其他操作。 |
 | -          | `inverted_index_compaction_enable = true`                    | 
开启索引合并（index compaction），减少 Compaction 时的 CPU 消耗。 |
 | -          | `enable_segcompaction = false` `enable_ordered_data_compaction 
= false` | 关闭日志场景不需要的两个 Compaction 功能。                   |
+| -          | `enable_compaction_priority_scheduling = false` | 
低优先级compaction在一块盘上限制 2 个任务，会影响compaction 速度。 |
+| -          | `total_permits_for_compaction_score = 200000 ` | 该参数用来控制内存，time 
series 策略下本身可以控制内存。 |
 | 缓存       | `disable_storage_page_cache = true` 
`inverted_index_searcher_cache_limit = 30%` | 
因为日志数据量较大，缓存（cache）作用有限，因此关闭数据缓存，调换为索引缓存（index cache）的方式。 |
 | -          | `inverted_index_cache_stale_sweep_time_sec = 3600` 
`index_cache_entry_stay_time_after_lookup_s = 3600` | 让索引缓存在内存中尽量保留 1 小时。       
                   |
 | -          | `enable_inverted_index_cache_on_cooldown = true` <br 
/>`enable_write_index_searcher_cache = false` | 开启索引上传冷数据存储时自动缓存的功能。            
         |
 | -          | `tablet_schema_cache_recycle_interval = 3600` 
`segment_cache_capacity = 20000` | 减少其他缓存对内存的占用。                                
   |
+| -          | `inverted_index_ram_dir_enable = true` | 减少写入时索引临时文件带来的IO开销。|
 | 线程       | `pipeline_executor_size = 24` 
`doris_scanner_thread_pool_thread_num = 48` | 32 核 CPU 的计算线程和 I/O 
线程配置，根据核数等比扩缩。      |
 | -          | `scan_thread_nice_value = 5`                                 | 
降低查询 I/O 线程的优先级，保证写入性能和时效性。            |
 | 其他       | `string_type_length_soft_limit_bytes = 10485760`             | 将 
String 类型数据的长度限制调高至 10 MB。                   |
@@ -241,7 +244,7 @@ Apache Doris 对 Flexible Schema 的日志数据提供了几个方面的支持
 - 对于热存储数据，如果使用云盘，可配置 1 副本；如果使用物理盘，则至少配置 2 副本。
 - 配置 `log_s3` 的存储位置，并设置 `log_policy_3day` 冷热数据分层策略，即在超过 3 天后将数据冷却至 `log_s3` 
指定的存储位置。可参考以下代码：
 
-```Go
+```SQL
 CREATE DATABASE log_db;
 USE log_db;
 
@@ -278,6 +281,7 @@ DUPLICATE KEY(`ts`)
 PARTITION BY RANGE(`ts`) ()
 DISTRIBUTED BY RANDOM BUCKETS 250
 PROPERTIES (
+"compaction_policy" = "time_series",
 "dynamic_partition.enable" = "true",
 "dynamic_partition.create_history_partition" = "true",
 "dynamic_partition.time_unit" = "DAY",
@@ -285,11 +289,10 @@ PROPERTIES (
 "dynamic_partition.end" = "1",
 "dynamic_partition.prefix" = "p",
 "dynamic_partition.buckets" = "250",
-"dynamic_partition.replication_num" = "1", -- 存算分离不需要
-"replication_num" = "1" -- 存算分离不需要
+"dynamic_partition.replication_num" = "2", -- 存算分离不需要
+"replication_num" = "2" -- 存算分离不需要
 "enable_single_replica_compaction" = "true", -- 存算分离不需要
-"storage_policy" = "log_policy_3day", -- 存算分离不需要
-"compaction_policy" = "time_series"
+"storage_policy" = "log_policy_3day" -- 存算分离不需要
 );
 ```
 
@@ -309,7 +312,7 @@ Apache Doris 提供开放、通用的 Stream HTTP APIs，通过这些 APIs，你
   
 - 从源码编译，并运行下方命令安装：
 
-```markdown 
+``` 
 ./bin/logstash-plugin install logstash-output-doris-1.0.0.gem
 ```
 
@@ -317,7 +320,7 @@ Apache Doris 提供开放、通用的 Stream HTTP APIs，通过这些 APIs，你
 
 - `logstash.yml`：配置 Logstash 批处理日志的条数和时间，用于提升数据写入性能。
 
-```markdown
+```
 pipeline.batch.size: 1000000  
 pipeline.batch.delay: 10000
 ```
@@ -325,36 +328,39 @@ pipeline.batch.delay: 10000
 
 - `logstash_demo.conf`：配置所采集日志的具体输入路径和输出到 Apache Doris 的设置。
 
-```markdown  
+```  
 input {  
-file {  
-path => "/path/to/your/log"  
-}  
-}  
-<br />output {  
-doris {  
-http_hosts => \[ "<http://fehost1:http_port>", "<http://fehost2:http_port>", 
"<http://fehost3:http_port"\>]  
-user => "your_username"  
-password => "your_password"  
-db => "your_db"  
-table => "your_table"  
-\# doris stream load http headers  
-headers => {  
-"format" => "json"  
-"read_json_by_line" => "true"  
-"load_to_single_tablet" => "true"  
-}  
-\# field mapping: doris fileld name => logstash field name  
-\# %{} to get a logstash field, \[\] for nested field such as \[host\]\[name\] 
for host.name  
-mapping => {  
-"ts" => "%{@timestamp}"  
-"host" => "%{\[host\]\[name\]}"  
-"path" => "%{\[log\]\[file\]\[path\]}"  
-"message" => "%{message}"  
-}  
-log_request => true  
-log_speed_interval => 10  
+    file {  
+    path => "/path/to/your/log"  
+  }  
 }  
+
+output {  
+  doris {  
+    http_hosts => [ "<http://fehost1:http_port>", 
"<http://fehost2:http_port>", "<http://fehost3:http_port";>]  
+    user => "your_username"  
+    password => "your_password"  
+    db => "your_db"  
+    table => "your_table"  
+    
+    # doris stream load http headers  
+    headers => {  
+    "format" => "json"  
+    "read_json_by_line" => "true"  
+    "load_to_single_tablet" => "true"  
+    }  
+    
+    # field mapping: doris fileld name => logstash field name  
+    # %{} to get a logstash field, [] for nested field such as [host][name] 
for host.name  
+    mapping => {  
+    "ts" => "%{@timestamp}"  
+    "host" => "%{[host][name]}"  
+    "path" => "%{[log][file][path]}"  
+    "message" => "%{message}"  
+    }  
+    log_request => true  
+    log_speed_interval => 10  
+  }  
 }
 ```
 
@@ -446,11 +452,11 @@ chmod +x filebeat-doris-1.0.0
 
 将 JSON 格式的日志写入 Kafka 的消息队列，创建 Kafka Routine Load，即可让 Apache Doris 从 Kafka 
主动拉取数据。
 
-可参考如下示例。其中，`property.\*` 是 Librdkafka 客户端相关配置，根据实际 Kafka 集群情况配置。
+可参考如下示例。其中，`property.*` 是 Librdkafka 客户端相关配置，根据实际 Kafka 集群情况配置。
 
 ```SQL  
-\-- 准备好kafka集群和topic log_\_topic_  
-\-- 创建routine load，从kafka log_\_topic_将数据导入log_table表  
+-- 准备好kafka集群和topic log__topic_  
+-- 创建routine load，从kafka log__topic_将数据导入log_table表  
 CREATE ROUTINE LOAD load_log_kafka ON log_db.log_table  
 COLUMNS(ts, clientip, request, status, size)  
 PROPERTIES (  
@@ -464,7 +470,7 @@ PROPERTIES (
 )  
 FROM KAFKA (  
 "kafka_broker_list" = "host:port",  
-"kafka_topic" = "log_\_topic_",  
+"kafka_topic" = "log__topic_",  
 "property.group.id" = "your_group_id",  
 "property.security.protocol"="SASL_PLAINTEXT",  
 "property.sasl.mechanism"="GSSAPI",  
@@ -483,15 +489,15 @@ SHOW ROUTINE LOAD;
 除了对接常用的日志采集器以外，你也可以自定义程序，通过 HTTP API Stream Load 将日志数据导入 Apache Doris。参考以下代码：
 
 ```Bash  
-curl \\  
-\--location-trusted \\  
-\-u username:password \\  
-\-H "format:json" \\  
-\-H "read_json_by_line:true" \\  
-\-H "load_to_single_tablet:true" \\  
-\-H "timeout:600" \\  
-\-T logfile.json \\  
-http://fe_host:fe_http_port/api/log_db/log_table/\_stream_load
+curl   
+--location-trusted   
+-u username:password   
+-H "format:json"   
+-H "read_json_by_line:true"   
+-H "load_to_single_tablet:true"   
+-H "timeout:600"   
+-T logfile.json   
+http://fe_host:fe_http_port/api/log_db/log_table/_stream_load
 ```
 
 在使用自定义程序时，需注意以下关键点：
@@ -517,33 +523,33 @@ mysql -h fe_host -P fe_mysql_port -u your_username 
-Dyour_db_name
 - 查看最新的 10 条数据
 
 ```SQL  
-SELECT \* FROM your_table_name ORDER BY ts DESC LIMIT 10;
+SELECT * FROM your_table_name ORDER BY ts DESC LIMIT 10;
 ```
 
 - 查询 `host` 为 `8.8.8.8` 的最新 10 条数据
 
 ```SQL  
-SELECT \* FROM your_table_name WHERE host = '8.8.8.8' ORDER BY ts DESC LIMIT 
10;
+SELECT * FROM your_table_name WHERE host = '8.8.8.8' ORDER BY ts DESC LIMIT 10;
 ```
 
 - 检索请求字段中有 `error` 或者 `404` 的最新 10 条数据。其中，`MATCH_ANY` 是 Apache Doris 全文检索的 SQL 
语法，用于匹配参数中任一关键字。
 
 ```SQL  
-SELECT \* FROM your_table_name WHERE message MATCH_ANY 'error 404'  
+SELECT * FROM your_table_name WHERE message MATCH_ANY 'error 404'  
 ORDER BY ts DESC LIMIT 10;
 ```
 
 - 检索请求字段中有 `image` 和 `faq` 的最新 10 条数据。其中，`MATCH_ALL` 是 Apache Doris 全文检索的 SQL 
语法，用于匹配参数中所有关键字。
 
 ```SQL  
-SELECT \* FROM your_table_name WHERE message MATCH_ALL 'image faq'  
+SELECT * FROM your_table_name WHERE message MATCH_ALL 'image faq'  
 ORDER BY ts DESC LIMIT 10;
 ```
 
 - 检索请求字段中有 `image` 和 `faq` 的最新 10 条数据。其中，`MATCH_PHRASE` 是 Apache Doris 全文检索的 
SQL 语法，用于匹配参数中所有关键字，并且要求顺序一致。在下方例子中，`a image faq b` 能匹配，但是 `a faq image b` 
不能匹配，因为 `image` 和 `faq` 的顺序与查询不一致。
 
 ```SQL
-SELECT \* FROM your_table_name WHERE message MATCH_PHRASE 'image faq'  
+SELECT * FROM your_table_name WHERE message MATCH_PHRASE 'image faq'  
 ORDER BY ts DESC LIMIT 10;
 ```
 
diff --git a/gettingStarted/tutorials/log-storage-analysis.md 
b/gettingStarted/tutorials/log-storage-analysis.md
index af49c722bf2..51ff5cead14 100644
--- a/gettingStarted/tutorials/log-storage-analysis.md
+++ b/gettingStarted/tutorials/log-storage-analysis.md
@@ -194,7 +194,7 @@ You can find FE configuration fields in `fe/conf/fe.conf`. 
Refer to the followin
 | Configuration fields to be optimized                         | Description   
                                               |
 | :----------------------------------------------------------- | 
:----------------------------------------------------------- |
 | `max_running_txn_num_per_db = 10000`                         | Increase the 
parameter value to adapt to high-concurrency import transactions. |
-| `streaming_label_keep_max_second = 3600``label_keep_max_second = 7200` | 
Increase the retention time to handle high-frequency import transactions with 
high memory usage. |
+| `streaming_label_keep_max_second = 3600` `label_keep_max_second = 7200` | 
Increase the retention time to handle high-frequency import transactions with 
high memory usage. |
 | `enable_round_robin_create_tablet = true`                    | When creating 
Tablets, use a Round Robin strategy to distribute evenly. |
 | `tablet_rebalancer_type = partition`                         | When 
balancing Tablets, use a strategy to evenly distribute within each partition. |
 | `enable_single_replica_load = true`                          | Enable 
single-replica import, where multiple replicas only need to build an index once 
to reduce CPU consumption. |
@@ -218,10 +218,13 @@ You can find BE configuration fields in 
`be/conf/be.conf`. Refer to the followin
 | Compaction | `max_cumu_compaction_threads = 8`                            | 
Set to CPU core count / 4, indicating that 1/4 of CPU resources are used for 
writing, 1/4 for background compaction, and 2/1 for queries and other 
operations. |
 | -          | `inverted_index_compaction_enable = true`                    | 
Enable inverted index compaction to reduce CPU consumption during compaction. |
 | -          | `enable_segcompaction = false` `enable_ordered_data_compaction 
= false` | Disable two compaction features that are unnecessary for log 
scenarios. |
+| -          | `enable_compaction_priority_scheduling = false` | Low-priority 
compaction is limited to 2 tasks on a single disk, which can affect the speed 
of compaction. |
+| -          | `total_permits_for_compaction_score = 200000 ` | The parameter 
is used to control memory, under the memory time series strategy, the parameter 
itself can control memory. |
 | Cache      | `disable_storage_page_cache = true` 
`inverted_index_searcher_cache_limit = 30%` | Due to the large volume of log 
data and limited caching effect, switch from data caching to index caching. |
 | -          | `inverted_index_cache_stale_sweep_time_sec = 3600` 
`index_cache_entry_stay_time_after_lookup_s = 3600` | Maintain index caching in 
memory for up to 1 hour.           |
 | -          | `enable_inverted_index_cache_on_cooldown = true`<br 
/>`enable_write_index_searcher_cache = false` | Enable automatic caching of 
cold data storage during index uploading. |
 | -          | `tablet_schema_cache_recycle_interval = 3600` 
`segment_cache_capacity = 20000` | Reduce memory usage by other caches.         
                |
+| -          | `inverted_index_ram_dir_enable = true` | Reduce the IO overhead 
caused by writing to index files temporarily. |
 | Thread     | `pipeline_executor_size = 24` 
`doris_scanner_thread_pool_thread_num = 48` | Configure computing threads and 
I/O threads for a 32-core CPU in proportion to core count. |
 | -          | `scan_thread_nice_value = 5`                                 | 
Lower the priority of query I/O threads to ensure writing performance and 
timeliness. |
 | Other      | `string_type_length_soft_limit_bytes = 10485760`             | 
Increase the length limit of string-type data to 10 MB.      |
@@ -274,7 +277,7 @@ Configure storage policies as follows:
 
 - Configure the storage location for log_s3 and set the log_policy_3day 
policy, where the data is cooled and moved to the specified storage location of 
log_s3 after 3 days. Refer to the code below.
 
-```Go  
+```SQL
 CREATE DATABASE log_db;
 USE log_db;
 
@@ -311,6 +314,7 @@ DUPLICATE KEY(`ts`)
 PARTITION BY RANGE(`ts`) ()
 DISTRIBUTED BY RANDOM BUCKETS 250
 PROPERTIES (
+"compaction_policy" = "time_series",
 "dynamic_partition.enable" = "true",
 "dynamic_partition.create_history_partition" = "true",
 "dynamic_partition.time_unit" = "DAY",
@@ -318,11 +322,10 @@ PROPERTIES (
 "dynamic_partition.end" = "1",
 "dynamic_partition.prefix" = "p",
 "dynamic_partition.buckets" = "250",
-"dynamic_partition.replication_num" = "1", -- unneccessary for the 
compute-storage coupled mode
-"replication_num" = "1" -- unneccessary for the compute-storage coupled mode
+"dynamic_partition.replication_num" = "2", -- unneccessary for the 
compute-storage coupled mode
+"replication_num" = "2" -- unneccessary for the compute-storage coupled mode
 "enable_single_replica_compaction" = "true", -- unneccessary for the 
compute-storage coupled mode
-"storage_policy" = "log_policy_3day", -- unneccessary for the compute-storage 
coupled mode
-"compaction_policy" = "time_series"
+"storage_policy" = "log_policy_3day" -- unneccessary for the compute-storage 
coupled mode
 );
 ```
 
@@ -348,53 +351,56 @@ Follow these steps:
 
 2. Configure Logstash. Specify the following fields:
 
-    - `logstash.yml`: Used to configure Logstash batch processing log sizes 
and timings for improved data writing performance.
+- `logstash.yml`: Used to configure Logstash batch processing log sizes and 
timings for improved data writing performance.
 
-    ```Plain Text  
-    pipeline.batch.size: 1000000  
-    pipeline.batch.delay: 10000
-    ```
+```Plain Text  
+pipeline.batch.size: 1000000  
+pipeline.batch.delay: 10000
+```
 
-    - `logstash_demo.conf`: Used to configure the specific input path of the 
collected logs and the settings for output to Apache Doris.
+- `logstash_demo.conf`: Used to configure the specific input path of the 
collected logs and the settings for output to Apache Doris.
 
-    ```markdown  
-    input {  
+```  
+input {  
     file {  
     path => "/path/to/your/log"  
-    }  
-    }  
-    <br />output {  
-    doris {  
-    http_hosts => \[ "<http://fehost1:http_port>", 
"<http://fehost2:http_port>", "<http://fehost3:http_port"\>]  
+  }  
+}  
+
+output {  
+  doris {  
+    http_hosts => [ "<http://fehost1:http_port>", 
"<http://fehost2:http_port>", "<http://fehost3:http_port";>]  
     user => "your_username"  
     password => "your_password"  
     db => "your_db"  
     table => "your_table"  
-    \# doris stream load http headers  
+    
+    # doris stream load http headers  
     headers => {  
     "format" => "json"  
     "read_json_by_line" => "true"  
     "load_to_single_tablet" => "true"  
     }  
-    \# field mapping: doris fileld name => logstash field name  
-    \# %{} to get a logstash field, \[\] for nested field such as 
\[host\]\[name\] for host.name  
+    
+    # field mapping: doris fileld name => logstash field name  
+    # %{} to get a logstash field, [] for nested field such as [host][name] 
for host.name  
     mapping => {  
     "ts" => "%{@timestamp}"  
-    "host" => "%{\[host\]\[name\]}"  
-    "path" => "%{\[log\]\[file\]\[path\]}"  
+    "host" => "%{[host][name]}"  
+    "path" => "%{[log][file][path]}"  
     "message" => "%{message}"  
     }  
     log_request => true  
     log_speed_interval => 10  
-    }  
-    }
+  }  
+}
     ```
 
 3. Run Logstash according to the command below, collect logs, and output to 
Apache Doris.
 
-    ```Bash  
-    ./bin/logstash -f logstash_demo.conf
-    ```
+```Bash  
+./bin/logstash -f logstash_demo.conf
+```
 
 For more information about the Logstash Doris Output plugin, see [Logstash 
Doris Output Plugin](../ecosystem/logstash.md).
 
@@ -406,56 +412,56 @@ Follow these steps:
 
 2. Configure Filebeat. Specify the filebeat_demo.yml field that is used to 
configure the specific input path of the collected logs and the settings for 
output to Apache Doris.
 
-    ```YAML  
-    # input
-    filebeat.inputs:
-    - type: log
-    enabled: true
-    paths:
-        - /path/to/your/log
-    multiline:
-        type: pattern
-        pattern: '^[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}'
-        negate: true
-        match: after
-        skip_newline: true
-
-    processors:
-    - script:
-        lang: javascript
-        source: >
-            function process(event) {
-                var msg = event.Get("message");
-                msg = msg.replace(/\t/g, "  ");
-                event.Put("message", msg);
-            }
-    - dissect:
-        # 2024-06-08 18:26:25,481 INFO (report-thread|199) 
[ReportHandler.cpuReport():617] begin to handle
-        tokenizer: "%{day} %{time} %{log_level} (%{thread}) [%{position}] 
%{content}"
-        target_prefix: ""
-        ignore_failure: true
-        overwrite_keys: true
-
-    # queue and batch
-    queue.mem:
-    events: 1000000
-    flush.min_events: 100000
-    flush.timeout: 10s
-
-    # output
-    output.doris:
-    fenodes: [ "http://fehost1:http_port";, "http://fehost2:http_port";, 
"http://fehost3:http_port"; ]
-    user: "your_username"
-    password: "your_password"
-    database: "your_db"
-    table: "your_table"
-    # output string format
-    codec_format_string: '{"ts": "%{[day]} %{[time]}", "host": 
"%{[agent][hostname]}", "path": "%{[log][file][path]}", "message": 
"%{[message]}"}'
-    headers:
-        format: "json"
-        read_json_by_line: "true"
-        load_to_single_tablet: "true"
-    ```
+```YAML  
+# input
+filebeat.inputs:
+- type: log
+enabled: true
+paths:
+    - /path/to/your/log
+multiline:
+    type: pattern
+    pattern: '^[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}'
+    negate: true
+    match: after
+    skip_newline: true
+
+processors:
+- script:
+    lang: javascript
+    source: >
+        function process(event) {
+            var msg = event.Get("message");
+            msg = msg.replace(/\t/g, "  ");
+            event.Put("message", msg);
+        }
+- dissect:
+    # 2024-06-08 18:26:25,481 INFO (report-thread|199) 
[ReportHandler.cpuReport():617] begin to handle
+    tokenizer: "%{day} %{time} %{log_level} (%{thread}) [%{position}] 
%{content}"
+    target_prefix: ""
+    ignore_failure: true
+    overwrite_keys: true
+
+# queue and batch
+queue.mem:
+events: 1000000
+flush.min_events: 100000
+flush.timeout: 10s
+
+# output
+output.doris:
+fenodes: [ "http://fehost1:http_port";, "http://fehost2:http_port";, 
"http://fehost3:http_port"; ]
+user: "your_username"
+password: "your_password"
+database: "your_db"
+table: "your_table"
+# output string format
+codec_format_string: '{"ts": "%{[day]} %{[time]}", "host": 
"%{[agent][hostname]}", "path": "%{[log][file][path]}", "message": 
"%{[message]}"}'
+headers:
+    format: "json"
+    read_json_by_line: "true"
+    load_to_single_tablet: "true"
+```
 
 3. Run Filebeat according to the command below, collect logs, and output to 
Apache Doris.
 
@@ -470,7 +476,7 @@ For more information about Filebeat, refer to [Beats Doris 
Output Plugin](../eco
 
 Write JSON formatted logs to Kafka's message queue, create a Kafka Routine 
Load, and allow Apache Doris to actively pull data from Kafka.
 
-You can refer to the example below, where `property.\*` represents Librdkafka 
client-related configurations and needs to be adjusted according to the actual 
Kafka cluster situation.
+You can refer to the example below, where `property.*` represents Librdkafka 
client-related configurations and needs to be adjusted according to the actual 
Kafka cluster situation.
 
 ```SQL  
 CREATE ROUTINE LOAD load_log_kafka ON log_db.log_table  
@@ -486,7 +492,7 @@ PROPERTIES (
 )  
 FROM KAFKA (  
 "kafka_broker_list" = "host:port",  
-"kafka_topic" = "log_\_topic_",  
+"kafka_topic" = "log__topic_",  
 "property.group.id" = "your_group_id",  
 "property.security.protocol"="SASL_PLAINTEXT",  
 "property.sasl.mechanism"="GSSAPI",  
@@ -504,15 +510,15 @@ For more information about Kafka, see [Routine 
Load](../data-operate/import/rout
 In addition to integrating common log collectors, you can also customize 
programs to import log data into Apache Doris using the Stream Load HTTP API. 
Refer to the following code:
 
 ```Bash  
-curl \\  
-\--location-trusted \\  
-\-u username:password \\  
-\-H "format:json" \\  
-\-H "read_json_by_line:true" \\  
-\-H "load_to_single_tablet:true" \\  
-\-H "timeout:600" \\  
-\-T logfile.json \\  
-http://fe_host:fe_http_port/api/log_db/log_table/\_stream_load
+curl   
+--location-trusted   
+-u username:password   
+-H "format:json"   
+-H "read_json_by_line:true"   
+-H "load_to_single_tablet:true"   
+-H "timeout:600"   
+-T logfile.json   
+http://fe_host:fe_http_port/api/log_db/log_table/_stream_load
 ```
 
 When using custom programs, pay attention to the following key points:
@@ -542,33 +548,33 @@ Here are 5 common SQL query commands for reference:
 - View the latest 10 log entries
 
 ```SQL  
-SELECT \* FROM your_table_name ORDER BY ts DESC LIMIT 10;
+SELECT * FROM your_table_name ORDER BY ts DESC LIMIT 10;
 ```
 
 - Query the latest 10 log entries with the host as 8.8.8.8
 
 ```SQL  
-SELECT \* FROM your_table_name WHERE host = '8.8.8.8' ORDER BY ts DESC LIMIT 
10;
+SELECT * FROM your_table_name WHERE host = '8.8.8.8' ORDER BY ts DESC LIMIT 10;
 ```
 
 - Retrieve the latest 10 log entries with error or 404 in the request field. 
In the command below, MATCH_ANY is a full-text search SQL syntax used by Apache 
Doris for matching any keyword in the fields.
 
 ```SQL  
-SELECT \* FROM your_table_name WHERE message **MATCH_ANY** 'error 404'  
+SELECT * FROM your_table_name WHERE message **MATCH_ANY** 'error 404'  
 ORDER BY ts DESC LIMIT 10;
 ```
 
 - Retrieve the latest 10 log entries with image and faq in the request field. 
In the command below, MATCH_ALL is a full-text search SQL syntax used by Apache 
Doris for matching all keywords in the fields.
 
 ```SQL  
-SELECT \* FROM your_table_name WHERE message **MATCH_ALL** 'image faq'  
+SELECT * FROM your_table_name WHERE message **MATCH_ALL** 'image faq'  
 ORDER BY ts DESC LIMIT 10;
 ```
 
 - Retrieve the latest 10 entries with image and faq in the request field. In 
the following command, MATCH_PHRASE is a full-text search SQL syntax used by 
Apache Doris for matching all keywords in the fields and requiring consistent 
order. In the example below, a image faq b can match, but a faq image b cannot 
match because the order of image and faq does not match the syntax.
 
 ```SQL  
-SELECT \* FROM your_table_name WHERE message **MATCH_PHRASE** 'image faq'  
+SELECT * FROM your_table_name WHERE message **MATCH_PHRASE** 'image faq'  
 ORDER BY ts DESC LIMIT 10;
 ```
 
diff --git a/versioned_docs/version-2.0/practical-guide/log-storage-analysis.md 
b/versioned_docs/version-2.0/practical-guide/log-storage-analysis.md
index 23b13a1006d..51ff5cead14 100644
--- a/versioned_docs/version-2.0/practical-guide/log-storage-analysis.md
+++ b/versioned_docs/version-2.0/practical-guide/log-storage-analysis.md
@@ -1,6 +1,6 @@
 ---
 {
-    "title": "Log Storage and Analysis",
+    "title": "Building log analysis platform",
     "language": "en"
 }
 ---
@@ -171,7 +171,7 @@ Refer to the following table to learn about the values of 
indicators in the exam
 | Percent of CPU resources reserved for data querying | 50% | Specify the 
value according to your actual needs. The default value is 50%. |
 | Estimated number of BE servers | 15.2 | Calculation formula: Number of CPU 
cores for the peak write throughput / Number of CPU cores of a BE server /（1 - 
Percent of CPU resources reserved for data querying） |
 | Rounded number of BE servers | 15  | Calculation formula: MAX (Number of 
data copies, Estimated number of BE servers) |
-| Estimated data storage space for each BE server (TB) | 4.03 | Calculation 
formula: Estimated storage space for hot data / Estimated number of BE servers 
/（1 - 30%）, where 30% represents the percent of reserved storage 
space.<br/><br/>It is recommended to mount 4 to 12 data disks on each BE server 
to enhance I/O capabilities. |
+| Estimated data storage space for each BE server (TB) | 4.03 | Calculation 
formula: Estimated storage space for hot data / Estimated number of BE servers 
/（1 - 30%）, where 30% represents the percent of reserved storage space.<br 
/><br />It is recommended to mount 4 to 12 data disks on each BE server to 
enhance I/O capabilities. |
 
 ### Step 2: Deploy the cluster
 
@@ -194,7 +194,7 @@ You can find FE configuration fields in `fe/conf/fe.conf`. 
Refer to the followin
 | Configuration fields to be optimized                         | Description   
                                               |
 | :----------------------------------------------------------- | 
:----------------------------------------------------------- |
 | `max_running_txn_num_per_db = 10000`                         | Increase the 
parameter value to adapt to high-concurrency import transactions. |
-| `streaming_label_keep_max_second = 3600``label_keep_max_second = 7200` | 
Increase the retention time to handle high-frequency import transactions with 
high memory usage. |
+| `streaming_label_keep_max_second = 3600` `label_keep_max_second = 7200` | 
Increase the retention time to handle high-frequency import transactions with 
high memory usage. |
 | `enable_round_robin_create_tablet = true`                    | When creating 
Tablets, use a Round Robin strategy to distribute evenly. |
 | `tablet_rebalancer_type = partition`                         | When 
balancing Tablets, use a strategy to evenly distribute within each partition. |
 | `enable_single_replica_load = true`                          | Enable 
single-replica import, where multiple replicas only need to build an index once 
to reduce CPU consumption. |
@@ -218,10 +218,13 @@ You can find BE configuration fields in 
`be/conf/be.conf`. Refer to the followin
 | Compaction | `max_cumu_compaction_threads = 8`                            | 
Set to CPU core count / 4, indicating that 1/4 of CPU resources are used for 
writing, 1/4 for background compaction, and 2/1 for queries and other 
operations. |
 | -          | `inverted_index_compaction_enable = true`                    | 
Enable inverted index compaction to reduce CPU consumption during compaction. |
 | -          | `enable_segcompaction = false` `enable_ordered_data_compaction 
= false` | Disable two compaction features that are unnecessary for log 
scenarios. |
+| -          | `enable_compaction_priority_scheduling = false` | Low-priority 
compaction is limited to 2 tasks on a single disk, which can affect the speed 
of compaction. |
+| -          | `total_permits_for_compaction_score = 200000 ` | The parameter 
is used to control memory, under the memory time series strategy, the parameter 
itself can control memory. |
 | Cache      | `disable_storage_page_cache = true` 
`inverted_index_searcher_cache_limit = 30%` | Due to the large volume of log 
data and limited caching effect, switch from data caching to index caching. |
 | -          | `inverted_index_cache_stale_sweep_time_sec = 3600` 
`index_cache_entry_stay_time_after_lookup_s = 3600` | Maintain index caching in 
memory for up to 1 hour.           |
 | -          | `enable_inverted_index_cache_on_cooldown = true`<br 
/>`enable_write_index_searcher_cache = false` | Enable automatic caching of 
cold data storage during index uploading. |
 | -          | `tablet_schema_cache_recycle_interval = 3600` 
`segment_cache_capacity = 20000` | Reduce memory usage by other caches.         
                |
+| -          | `inverted_index_ram_dir_enable = true` | Reduce the IO overhead 
caused by writing to index files temporarily. |
 | Thread     | `pipeline_executor_size = 24` 
`doris_scanner_thread_pool_thread_num = 48` | Configure computing threads and 
I/O threads for a 32-core CPU in proportion to core count. |
 | -          | `scan_thread_nice_value = 5`                                 | 
Lower the priority of query I/O threads to ensure writing performance and 
timeliness. |
 | Other      | `string_type_length_soft_limit_bytes = 10485760`             | 
Increase the length limit of string-type data to 10 MB.      |
@@ -238,7 +241,7 @@ Due to the distinct characteristics of both writing and 
querying log data, it is
 
 - For data partitioning:
 
-    - Enable [range 
partitioning](https://doris.apache.org/docs/2.0/table-design/data-partition#range-partition)
 with [dynamic 
partitions](https://doris.apache.org/docs/2.0/table-design/data-partition#dynamic-partition)
 managed automatically by day.
+    - Enable [range 
partitioning](https://doris.apache.org/docs/table-design/data-partition#range-partition)
 with [dynamic 
partitions](https://doris.apache.org/docs/table-design/data-partition#dynamic-partition)
 managed automatically by day.
 
     - Use a field in the DATETIME type as the key for accelerated retrieval of 
the latest N log entries.
 
@@ -274,7 +277,7 @@ Configure storage policies as follows:
 
 - Configure the storage location for log_s3 and set the log_policy_3day 
policy, where the data is cooled and moved to the specified storage location of 
log_s3 after 3 days. Refer to the code below.
 
-```Go  
+```SQL
 CREATE DATABASE log_db;
 USE log_db;
 
@@ -311,6 +314,7 @@ DUPLICATE KEY(`ts`)
 PARTITION BY RANGE(`ts`) ()
 DISTRIBUTED BY RANDOM BUCKETS 250
 PROPERTIES (
+"compaction_policy" = "time_series",
 "dynamic_partition.enable" = "true",
 "dynamic_partition.create_history_partition" = "true",
 "dynamic_partition.time_unit" = "DAY",
@@ -318,11 +322,10 @@ PROPERTIES (
 "dynamic_partition.end" = "1",
 "dynamic_partition.prefix" = "p",
 "dynamic_partition.buckets" = "250",
-"dynamic_partition.replication_num" = "1", -- unneccessary for the 
compute-storage coupled mode
-"replication_num" = "1" -- unneccessary for the compute-storage coupled mode
+"dynamic_partition.replication_num" = "2", -- unneccessary for the 
compute-storage coupled mode
+"replication_num" = "2" -- unneccessary for the compute-storage coupled mode
 "enable_single_replica_compaction" = "true", -- unneccessary for the 
compute-storage coupled mode
-"storage_policy" = "log_policy_3day", -- unneccessary for the compute-storage 
coupled mode
-"compaction_policy" = "time_series"
+"storage_policy" = "log_policy_3day" -- unneccessary for the compute-storage 
coupled mode
 );
 ```
 
@@ -348,53 +351,56 @@ Follow these steps:
 
 2. Configure Logstash. Specify the following fields:
 
-    - `logstash.yml`: Used to configure Logstash batch processing log sizes 
and timings for improved data writing performance.
+- `logstash.yml`: Used to configure Logstash batch processing log sizes and 
timings for improved data writing performance.
 
-    ```Plain Text  
-    pipeline.batch.size: 1000000  
-    pipeline.batch.delay: 10000
-    ```
+```Plain Text  
+pipeline.batch.size: 1000000  
+pipeline.batch.delay: 10000
+```
 
-    - `logstash_demo.conf`: Used to configure the specific input path of the 
collected logs and the settings for output to Apache Doris.
+- `logstash_demo.conf`: Used to configure the specific input path of the 
collected logs and the settings for output to Apache Doris.
 
-    ```markdown  
-    input {  
+```  
+input {  
     file {  
     path => "/path/to/your/log"  
-    }  
-    }  
-    <br />output {  
-    doris {  
-    http_hosts => \[ "<http://fehost1:http_port>", 
"<http://fehost2:http_port>", "<http://fehost3:http_port"\>]  
+  }  
+}  
+
+output {  
+  doris {  
+    http_hosts => [ "<http://fehost1:http_port>", 
"<http://fehost2:http_port>", "<http://fehost3:http_port";>]  
     user => "your_username"  
     password => "your_password"  
     db => "your_db"  
     table => "your_table"  
-    \# doris stream load http headers  
+    
+    # doris stream load http headers  
     headers => {  
     "format" => "json"  
     "read_json_by_line" => "true"  
     "load_to_single_tablet" => "true"  
     }  
-    \# field mapping: doris fileld name => logstash field name  
-    \# %{} to get a logstash field, \[\] for nested field such as 
\[host\]\[name\] for host.name  
+    
+    # field mapping: doris fileld name => logstash field name  
+    # %{} to get a logstash field, [] for nested field such as [host][name] 
for host.name  
     mapping => {  
     "ts" => "%{@timestamp}"  
-    "host" => "%{\[host\]\[name\]}"  
-    "path" => "%{\[log\]\[file\]\[path\]}"  
+    "host" => "%{[host][name]}"  
+    "path" => "%{[log][file][path]}"  
     "message" => "%{message}"  
     }  
     log_request => true  
     log_speed_interval => 10  
-    }  
-    }
+  }  
+}
     ```
 
 3. Run Logstash according to the command below, collect logs, and output to 
Apache Doris.
 
-    ```Bash  
-    ./bin/logstash -f logstash_demo.conf
-    ```
+```Bash  
+./bin/logstash -f logstash_demo.conf
+```
 
 For more information about the Logstash Doris Output plugin, see [Logstash 
Doris Output Plugin](../ecosystem/logstash.md).
 
@@ -406,56 +412,56 @@ Follow these steps:
 
 2. Configure Filebeat. Specify the filebeat_demo.yml field that is used to 
configure the specific input path of the collected logs and the settings for 
output to Apache Doris.
 
-    ```YAML  
-    # input
-    filebeat.inputs:
-    - type: log
-    enabled: true
-    paths:
-        - /path/to/your/log
-    multiline:
-        type: pattern
-        pattern: '^[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}'
-        negate: true
-        match: after
-        skip_newline: true
-
-    processors:
-    - script:
-        lang: javascript
-        source: >
-            function process(event) {
-                var msg = event.Get("message");
-                msg = msg.replace(/\t/g, "  ");
-                event.Put("message", msg);
-            }
-    - dissect:
-        # 2024-06-08 18:26:25,481 INFO (report-thread|199) 
[ReportHandler.cpuReport():617] begin to handle
-        tokenizer: "%{day} %{time} %{log_level} (%{thread}) [%{position}] 
%{content}"
-        target_prefix: ""
-        ignore_failure: true
-        overwrite_keys: true
-
-    # queue and batch
-    queue.mem:
-    events: 1000000
-    flush.min_events: 100000
-    flush.timeout: 10s
-
-    # output
-    output.doris:
-    fenodes: [ "http://fehost1:http_port";, "http://fehost2:http_port";, 
"http://fehost3:http_port"; ]
-    user: "your_username"
-    password: "your_password"
-    database: "your_db"
-    table: "your_table"
-    # output string format
-    codec_format_string: '{"ts": "%{[day]} %{[time]}", "host": 
"%{[agent][hostname]}", "path": "%{[log][file][path]}", "message": 
"%{[message]}"}'
-    headers:
-        format: "json"
-        read_json_by_line: "true"
-        load_to_single_tablet: "true"
-    ```
+```YAML  
+# input
+filebeat.inputs:
+- type: log
+enabled: true
+paths:
+    - /path/to/your/log
+multiline:
+    type: pattern
+    pattern: '^[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}'
+    negate: true
+    match: after
+    skip_newline: true
+
+processors:
+- script:
+    lang: javascript
+    source: >
+        function process(event) {
+            var msg = event.Get("message");
+            msg = msg.replace(/\t/g, "  ");
+            event.Put("message", msg);
+        }
+- dissect:
+    # 2024-06-08 18:26:25,481 INFO (report-thread|199) 
[ReportHandler.cpuReport():617] begin to handle
+    tokenizer: "%{day} %{time} %{log_level} (%{thread}) [%{position}] 
%{content}"
+    target_prefix: ""
+    ignore_failure: true
+    overwrite_keys: true
+
+# queue and batch
+queue.mem:
+events: 1000000
+flush.min_events: 100000
+flush.timeout: 10s
+
+# output
+output.doris:
+fenodes: [ "http://fehost1:http_port";, "http://fehost2:http_port";, 
"http://fehost3:http_port"; ]
+user: "your_username"
+password: "your_password"
+database: "your_db"
+table: "your_table"
+# output string format
+codec_format_string: '{"ts": "%{[day]} %{[time]}", "host": 
"%{[agent][hostname]}", "path": "%{[log][file][path]}", "message": 
"%{[message]}"}'
+headers:
+    format: "json"
+    read_json_by_line: "true"
+    load_to_single_tablet: "true"
+```
 
 3. Run Filebeat according to the command below, collect logs, and output to 
Apache Doris.
 
@@ -470,7 +476,7 @@ For more information about Filebeat, refer to [Beats Doris 
Output Plugin](../eco
 
 Write JSON formatted logs to Kafka's message queue, create a Kafka Routine 
Load, and allow Apache Doris to actively pull data from Kafka.
 
-You can refer to the example below, where `property.\*` represents Librdkafka 
client-related configurations and needs to be adjusted according to the actual 
Kafka cluster situation.
+You can refer to the example below, where `property.*` represents Librdkafka 
client-related configurations and needs to be adjusted according to the actual 
Kafka cluster situation.
 
 ```SQL  
 CREATE ROUTINE LOAD load_log_kafka ON log_db.log_table  
@@ -486,7 +492,7 @@ PROPERTIES (
 )  
 FROM KAFKA (  
 "kafka_broker_list" = "host:port",  
-"kafka_topic" = "log_\_topic_",  
+"kafka_topic" = "log__topic_",  
 "property.group.id" = "your_group_id",  
 "property.security.protocol"="SASL_PLAINTEXT",  
 "property.sasl.mechanism"="GSSAPI",  
@@ -494,7 +500,7 @@ FROM KAFKA (
 "property.sasl.kerberos.keytab"="/path/to/xxx.keytab",  
 "property.sasl.kerberos.principal"="<x...@yyy.com>"  
 );  
-<br/>SHOW ROUTINE LOAD;
+<br />SHOW ROUTINE LOAD;
 ```
 
 For more information about Kafka, see [Routine 
Load](../data-operate/import/routine-load-manual.md)。
@@ -504,15 +510,15 @@ For more information about Kafka, see [Routine 
Load](../data-operate/import/rout
 In addition to integrating common log collectors, you can also customize 
programs to import log data into Apache Doris using the Stream Load HTTP API. 
Refer to the following code:
 
 ```Bash  
-curl \\  
-\--location-trusted \\  
-\-u username:password \\  
-\-H "format:json" \\  
-\-H "read_json_by_line:true" \\  
-\-H "load_to_single_tablet:true" \\  
-\-H "timeout:600" \\  
-\-T logfile.json \\  
-http://fe_host:fe_http_port/api/log_db/log_table/\_stream_load
+curl   
+--location-trusted   
+-u username:password   
+-H "format:json"   
+-H "read_json_by_line:true"   
+-H "load_to_single_tablet:true"   
+-H "timeout:600"   
+-T logfile.json   
+http://fe_host:fe_http_port/api/log_db/log_table/_stream_load
 ```
 
 When using custom programs, pay attention to the following key points:
@@ -542,33 +548,33 @@ Here are 5 common SQL query commands for reference:
 - View the latest 10 log entries
 
 ```SQL  
-SELECT \* FROM your_table_name ORDER BY ts DESC LIMIT 10;
+SELECT * FROM your_table_name ORDER BY ts DESC LIMIT 10;
 ```
 
 - Query the latest 10 log entries with the host as 8.8.8.8
 
 ```SQL  
-SELECT \* FROM your_table_name WHERE host = '8.8.8.8' ORDER BY ts DESC LIMIT 
10;
+SELECT * FROM your_table_name WHERE host = '8.8.8.8' ORDER BY ts DESC LIMIT 10;
 ```
 
 - Retrieve the latest 10 log entries with error or 404 in the request field. 
In the command below, MATCH_ANY is a full-text search SQL syntax used by Apache 
Doris for matching any keyword in the fields.
 
 ```SQL  
-SELECT \* FROM your_table_name WHERE message **MATCH_ANY** 'error 404'  
+SELECT * FROM your_table_name WHERE message **MATCH_ANY** 'error 404'  
 ORDER BY ts DESC LIMIT 10;
 ```
 
 - Retrieve the latest 10 log entries with image and faq in the request field. 
In the command below, MATCH_ALL is a full-text search SQL syntax used by Apache 
Doris for matching all keywords in the fields.
 
 ```SQL  
-SELECT \* FROM your_table_name WHERE message **MATCH_ALL** 'image faq'  
+SELECT * FROM your_table_name WHERE message **MATCH_ALL** 'image faq'  
 ORDER BY ts DESC LIMIT 10;
 ```
 
 - Retrieve the latest 10 entries with image and faq in the request field. In 
the following command, MATCH_PHRASE is a full-text search SQL syntax used by 
Apache Doris for matching all keywords in the fields and requiring consistent 
order. In the example below, a image faq b can match, but a faq image b cannot 
match because the order of image and faq does not match the syntax.
 
 ```SQL  
-SELECT \* FROM your_table_name WHERE message **MATCH_PHRASE** 'image faq'  
+SELECT * FROM your_table_name WHERE message **MATCH_PHRASE** 'image faq'  
 ORDER BY ts DESC LIMIT 10;
 ```
 
diff --git a/versioned_docs/version-2.1/practical-guide/log-storage-analysis.md 
b/versioned_docs/version-2.1/practical-guide/log-storage-analysis.md
index 61657d03987..51ff5cead14 100644
--- a/versioned_docs/version-2.1/practical-guide/log-storage-analysis.md
+++ b/versioned_docs/version-2.1/practical-guide/log-storage-analysis.md
@@ -1,6 +1,6 @@
 ---
 {
-    "title": "Log Storage and Analysis",
+    "title": "Building log analysis platform",
     "language": "en"
 }
 ---
@@ -171,7 +171,7 @@ Refer to the following table to learn about the values of 
indicators in the exam
 | Percent of CPU resources reserved for data querying | 50% | Specify the 
value according to your actual needs. The default value is 50%. |
 | Estimated number of BE servers | 15.2 | Calculation formula: Number of CPU 
cores for the peak write throughput / Number of CPU cores of a BE server /（1 - 
Percent of CPU resources reserved for data querying） |
 | Rounded number of BE servers | 15  | Calculation formula: MAX (Number of 
data copies, Estimated number of BE servers) |
-| Estimated data storage space for each BE server (TB) | 4.03 | Calculation 
formula: Estimated storage space for hot data / Estimated number of BE servers 
/（1 - 30%）, where 30% represents the percent of reserved storage 
space.<br/><br/>It is recommended to mount 4 to 12 data disks on each BE server 
to enhance I/O capabilities. |
+| Estimated data storage space for each BE server (TB) | 4.03 | Calculation 
formula: Estimated storage space for hot data / Estimated number of BE servers 
/（1 - 30%）, where 30% represents the percent of reserved storage space.<br 
/><br />It is recommended to mount 4 to 12 data disks on each BE server to 
enhance I/O capabilities. |
 
 ### Step 2: Deploy the cluster
 
@@ -194,7 +194,7 @@ You can find FE configuration fields in `fe/conf/fe.conf`. 
Refer to the followin
 | Configuration fields to be optimized                         | Description   
                                               |
 | :----------------------------------------------------------- | 
:----------------------------------------------------------- |
 | `max_running_txn_num_per_db = 10000`                         | Increase the 
parameter value to adapt to high-concurrency import transactions. |
-| `streaming_label_keep_max_second = 3600``label_keep_max_second = 7200` | 
Increase the retention time to handle high-frequency import transactions with 
high memory usage. |
+| `streaming_label_keep_max_second = 3600` `label_keep_max_second = 7200` | 
Increase the retention time to handle high-frequency import transactions with 
high memory usage. |
 | `enable_round_robin_create_tablet = true`                    | When creating 
Tablets, use a Round Robin strategy to distribute evenly. |
 | `tablet_rebalancer_type = partition`                         | When 
balancing Tablets, use a strategy to evenly distribute within each partition. |
 | `enable_single_replica_load = true`                          | Enable 
single-replica import, where multiple replicas only need to build an index once 
to reduce CPU consumption. |
@@ -218,10 +218,13 @@ You can find BE configuration fields in 
`be/conf/be.conf`. Refer to the followin
 | Compaction | `max_cumu_compaction_threads = 8`                            | 
Set to CPU core count / 4, indicating that 1/4 of CPU resources are used for 
writing, 1/4 for background compaction, and 2/1 for queries and other 
operations. |
 | -          | `inverted_index_compaction_enable = true`                    | 
Enable inverted index compaction to reduce CPU consumption during compaction. |
 | -          | `enable_segcompaction = false` `enable_ordered_data_compaction 
= false` | Disable two compaction features that are unnecessary for log 
scenarios. |
+| -          | `enable_compaction_priority_scheduling = false` | Low-priority 
compaction is limited to 2 tasks on a single disk, which can affect the speed 
of compaction. |
+| -          | `total_permits_for_compaction_score = 200000 ` | The parameter 
is used to control memory, under the memory time series strategy, the parameter 
itself can control memory. |
 | Cache      | `disable_storage_page_cache = true` 
`inverted_index_searcher_cache_limit = 30%` | Due to the large volume of log 
data and limited caching effect, switch from data caching to index caching. |
 | -          | `inverted_index_cache_stale_sweep_time_sec = 3600` 
`index_cache_entry_stay_time_after_lookup_s = 3600` | Maintain index caching in 
memory for up to 1 hour.           |
 | -          | `enable_inverted_index_cache_on_cooldown = true`<br 
/>`enable_write_index_searcher_cache = false` | Enable automatic caching of 
cold data storage during index uploading. |
 | -          | `tablet_schema_cache_recycle_interval = 3600` 
`segment_cache_capacity = 20000` | Reduce memory usage by other caches.         
                |
+| -          | `inverted_index_ram_dir_enable = true` | Reduce the IO overhead 
caused by writing to index files temporarily. |
 | Thread     | `pipeline_executor_size = 24` 
`doris_scanner_thread_pool_thread_num = 48` | Configure computing threads and 
I/O threads for a 32-core CPU in proportion to core count. |
 | -          | `scan_thread_nice_value = 5`                                 | 
Lower the priority of query I/O threads to ensure writing performance and 
timeliness. |
 | Other      | `string_type_length_soft_limit_bytes = 10485760`             | 
Increase the length limit of string-type data to 10 MB.      |
@@ -274,7 +277,7 @@ Configure storage policies as follows:
 
 - Configure the storage location for log_s3 and set the log_policy_3day 
policy, where the data is cooled and moved to the specified storage location of 
log_s3 after 3 days. Refer to the code below.
 
-```Go  
+```SQL
 CREATE DATABASE log_db;
 USE log_db;
 
@@ -311,6 +314,7 @@ DUPLICATE KEY(`ts`)
 PARTITION BY RANGE(`ts`) ()
 DISTRIBUTED BY RANDOM BUCKETS 250
 PROPERTIES (
+"compaction_policy" = "time_series",
 "dynamic_partition.enable" = "true",
 "dynamic_partition.create_history_partition" = "true",
 "dynamic_partition.time_unit" = "DAY",
@@ -318,11 +322,10 @@ PROPERTIES (
 "dynamic_partition.end" = "1",
 "dynamic_partition.prefix" = "p",
 "dynamic_partition.buckets" = "250",
-"dynamic_partition.replication_num" = "1", -- unneccessary for the 
compute-storage coupled mode
-"replication_num" = "1" -- unneccessary for the compute-storage coupled mode
+"dynamic_partition.replication_num" = "2", -- unneccessary for the 
compute-storage coupled mode
+"replication_num" = "2" -- unneccessary for the compute-storage coupled mode
 "enable_single_replica_compaction" = "true", -- unneccessary for the 
compute-storage coupled mode
-"storage_policy" = "log_policy_3day", -- unneccessary for the compute-storage 
coupled mode
-"compaction_policy" = "time_series"
+"storage_policy" = "log_policy_3day" -- unneccessary for the compute-storage 
coupled mode
 );
 ```
 
@@ -348,53 +351,56 @@ Follow these steps:
 
 2. Configure Logstash. Specify the following fields:
 
-    - `logstash.yml`: Used to configure Logstash batch processing log sizes 
and timings for improved data writing performance.
+- `logstash.yml`: Used to configure Logstash batch processing log sizes and 
timings for improved data writing performance.
 
-    ```Plain Text  
-    pipeline.batch.size: 1000000  
-    pipeline.batch.delay: 10000
-    ```
+```Plain Text  
+pipeline.batch.size: 1000000  
+pipeline.batch.delay: 10000
+```
 
-    - `logstash_demo.conf`: Used to configure the specific input path of the 
collected logs and the settings for output to Apache Doris.
+- `logstash_demo.conf`: Used to configure the specific input path of the 
collected logs and the settings for output to Apache Doris.
 
-    ```markdown  
-    input {  
+```  
+input {  
     file {  
     path => "/path/to/your/log"  
-    }  
-    }  
-    <br />output {  
-    doris {  
-    http_hosts => \[ "<http://fehost1:http_port>", 
"<http://fehost2:http_port>", "<http://fehost3:http_port"\>]  
+  }  
+}  
+
+output {  
+  doris {  
+    http_hosts => [ "<http://fehost1:http_port>", 
"<http://fehost2:http_port>", "<http://fehost3:http_port";>]  
     user => "your_username"  
     password => "your_password"  
     db => "your_db"  
     table => "your_table"  
-    \# doris stream load http headers  
+    
+    # doris stream load http headers  
     headers => {  
     "format" => "json"  
     "read_json_by_line" => "true"  
     "load_to_single_tablet" => "true"  
     }  
-    \# field mapping: doris fileld name => logstash field name  
-    \# %{} to get a logstash field, \[\] for nested field such as 
\[host\]\[name\] for host.name  
+    
+    # field mapping: doris fileld name => logstash field name  
+    # %{} to get a logstash field, [] for nested field such as [host][name] 
for host.name  
     mapping => {  
     "ts" => "%{@timestamp}"  
-    "host" => "%{\[host\]\[name\]}"  
-    "path" => "%{\[log\]\[file\]\[path\]}"  
+    "host" => "%{[host][name]}"  
+    "path" => "%{[log][file][path]}"  
     "message" => "%{message}"  
     }  
     log_request => true  
     log_speed_interval => 10  
-    }  
-    }
+  }  
+}
     ```
 
 3. Run Logstash according to the command below, collect logs, and output to 
Apache Doris.
 
-    ```Bash  
-    ./bin/logstash -f logstash_demo.conf
-    ```
+```Bash  
+./bin/logstash -f logstash_demo.conf
+```
 
 For more information about the Logstash Doris Output plugin, see [Logstash 
Doris Output Plugin](../ecosystem/logstash.md).
 
@@ -406,56 +412,56 @@ Follow these steps:
 
 2. Configure Filebeat. Specify the filebeat_demo.yml field that is used to 
configure the specific input path of the collected logs and the settings for 
output to Apache Doris.
 
-    ```YAML  
-    # input
-    filebeat.inputs:
-    - type: log
-    enabled: true
-    paths:
-        - /path/to/your/log
-    multiline:
-        type: pattern
-        pattern: '^[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}'
-        negate: true
-        match: after
-        skip_newline: true
-
-    processors:
-    - script:
-        lang: javascript
-        source: >
-            function process(event) {
-                var msg = event.Get("message");
-                msg = msg.replace(/\t/g, "  ");
-                event.Put("message", msg);
-            }
-    - dissect:
-        # 2024-06-08 18:26:25,481 INFO (report-thread|199) 
[ReportHandler.cpuReport():617] begin to handle
-        tokenizer: "%{day} %{time} %{log_level} (%{thread}) [%{position}] 
%{content}"
-        target_prefix: ""
-        ignore_failure: true
-        overwrite_keys: true
-
-    # queue and batch
-    queue.mem:
-    events: 1000000
-    flush.min_events: 100000
-    flush.timeout: 10s
-
-    # output
-    output.doris:
-    fenodes: [ "http://fehost1:http_port";, "http://fehost2:http_port";, 
"http://fehost3:http_port"; ]
-    user: "your_username"
-    password: "your_password"
-    database: "your_db"
-    table: "your_table"
-    # output string format
-    codec_format_string: '{"ts": "%{[day]} %{[time]}", "host": 
"%{[agent][hostname]}", "path": "%{[log][file][path]}", "message": 
"%{[message]}"}'
-    headers:
-        format: "json"
-        read_json_by_line: "true"
-        load_to_single_tablet: "true"
-    ```
+```YAML  
+# input
+filebeat.inputs:
+- type: log
+enabled: true
+paths:
+    - /path/to/your/log
+multiline:
+    type: pattern
+    pattern: '^[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}'
+    negate: true
+    match: after
+    skip_newline: true
+
+processors:
+- script:
+    lang: javascript
+    source: >
+        function process(event) {
+            var msg = event.Get("message");
+            msg = msg.replace(/\t/g, "  ");
+            event.Put("message", msg);
+        }
+- dissect:
+    # 2024-06-08 18:26:25,481 INFO (report-thread|199) 
[ReportHandler.cpuReport():617] begin to handle
+    tokenizer: "%{day} %{time} %{log_level} (%{thread}) [%{position}] 
%{content}"
+    target_prefix: ""
+    ignore_failure: true
+    overwrite_keys: true
+
+# queue and batch
+queue.mem:
+events: 1000000
+flush.min_events: 100000
+flush.timeout: 10s
+
+# output
+output.doris:
+fenodes: [ "http://fehost1:http_port";, "http://fehost2:http_port";, 
"http://fehost3:http_port"; ]
+user: "your_username"
+password: "your_password"
+database: "your_db"
+table: "your_table"
+# output string format
+codec_format_string: '{"ts": "%{[day]} %{[time]}", "host": 
"%{[agent][hostname]}", "path": "%{[log][file][path]}", "message": 
"%{[message]}"}'
+headers:
+    format: "json"
+    read_json_by_line: "true"
+    load_to_single_tablet: "true"
+```
 
 3. Run Filebeat according to the command below, collect logs, and output to 
Apache Doris.
 
@@ -470,7 +476,7 @@ For more information about Filebeat, refer to [Beats Doris 
Output Plugin](../eco
 
 Write JSON formatted logs to Kafka's message queue, create a Kafka Routine 
Load, and allow Apache Doris to actively pull data from Kafka.
 
-You can refer to the example below, where `property.\*` represents Librdkafka 
client-related configurations and needs to be adjusted according to the actual 
Kafka cluster situation.
+You can refer to the example below, where `property.*` represents Librdkafka 
client-related configurations and needs to be adjusted according to the actual 
Kafka cluster situation.
 
 ```SQL  
 CREATE ROUTINE LOAD load_log_kafka ON log_db.log_table  
@@ -486,7 +492,7 @@ PROPERTIES (
 )  
 FROM KAFKA (  
 "kafka_broker_list" = "host:port",  
-"kafka_topic" = "log_\_topic_",  
+"kafka_topic" = "log__topic_",  
 "property.group.id" = "your_group_id",  
 "property.security.protocol"="SASL_PLAINTEXT",  
 "property.sasl.mechanism"="GSSAPI",  
@@ -494,7 +500,7 @@ FROM KAFKA (
 "property.sasl.kerberos.keytab"="/path/to/xxx.keytab",  
 "property.sasl.kerberos.principal"="<x...@yyy.com>"  
 );  
-<br/>SHOW ROUTINE LOAD;
+<br />SHOW ROUTINE LOAD;
 ```
 
 For more information about Kafka, see [Routine 
Load](../data-operate/import/routine-load-manual.md)。
@@ -504,15 +510,15 @@ For more information about Kafka, see [Routine 
Load](../data-operate/import/rout
 In addition to integrating common log collectors, you can also customize 
programs to import log data into Apache Doris using the Stream Load HTTP API. 
Refer to the following code:
 
 ```Bash  
-curl \\  
-\--location-trusted \\  
-\-u username:password \\  
-\-H "format:json" \\  
-\-H "read_json_by_line:true" \\  
-\-H "load_to_single_tablet:true" \\  
-\-H "timeout:600" \\  
-\-T logfile.json \\  
-http://fe_host:fe_http_port/api/log_db/log_table/\_stream_load
+curl   
+--location-trusted   
+-u username:password   
+-H "format:json"   
+-H "read_json_by_line:true"   
+-H "load_to_single_tablet:true"   
+-H "timeout:600"   
+-T logfile.json   
+http://fe_host:fe_http_port/api/log_db/log_table/_stream_load
 ```
 
 When using custom programs, pay attention to the following key points:
@@ -542,33 +548,33 @@ Here are 5 common SQL query commands for reference:
 - View the latest 10 log entries
 
 ```SQL  
-SELECT \* FROM your_table_name ORDER BY ts DESC LIMIT 10;
+SELECT * FROM your_table_name ORDER BY ts DESC LIMIT 10;
 ```
 
 - Query the latest 10 log entries with the host as 8.8.8.8
 
 ```SQL  
-SELECT \* FROM your_table_name WHERE host = '8.8.8.8' ORDER BY ts DESC LIMIT 
10;
+SELECT * FROM your_table_name WHERE host = '8.8.8.8' ORDER BY ts DESC LIMIT 10;
 ```
 
 - Retrieve the latest 10 log entries with error or 404 in the request field. 
In the command below, MATCH_ANY is a full-text search SQL syntax used by Apache 
Doris for matching any keyword in the fields.
 
 ```SQL  
-SELECT \* FROM your_table_name WHERE message **MATCH_ANY** 'error 404'  
+SELECT * FROM your_table_name WHERE message **MATCH_ANY** 'error 404'  
 ORDER BY ts DESC LIMIT 10;
 ```
 
 - Retrieve the latest 10 log entries with image and faq in the request field. 
In the command below, MATCH_ALL is a full-text search SQL syntax used by Apache 
Doris for matching all keywords in the fields.
 
 ```SQL  
-SELECT \* FROM your_table_name WHERE message **MATCH_ALL** 'image faq'  
+SELECT * FROM your_table_name WHERE message **MATCH_ALL** 'image faq'  
 ORDER BY ts DESC LIMIT 10;
 ```
 
 - Retrieve the latest 10 entries with image and faq in the request field. In 
the following command, MATCH_PHRASE is a full-text search SQL syntax used by 
Apache Doris for matching all keywords in the fields and requiring consistent 
order. In the example below, a image faq b can match, but a faq image b cannot 
match because the order of image and faq does not match the syntax.
 
 ```SQL  
-SELECT \* FROM your_table_name WHERE message **MATCH_PHRASE** 'image faq'  
+SELECT * FROM your_table_name WHERE message **MATCH_PHRASE** 'image faq'  
 ORDER BY ts DESC LIMIT 10;
 ```
 
diff --git a/versioned_docs/version-3.0/practical-guide/log-storage-analysis.md 
b/versioned_docs/version-3.0/practical-guide/log-storage-analysis.md
index 99d251de708..51ff5cead14 100644
--- a/versioned_docs/version-3.0/practical-guide/log-storage-analysis.md
+++ b/versioned_docs/version-3.0/practical-guide/log-storage-analysis.md
@@ -1,6 +1,6 @@
 ---
 {
-    "title": "Log Storage and Analysis",
+    "title": "Building log analysis platform",
     "language": "en"
 }
 ---
@@ -194,7 +194,7 @@ You can find FE configuration fields in `fe/conf/fe.conf`. 
Refer to the followin
 | Configuration fields to be optimized                         | Description   
                                               |
 | :----------------------------------------------------------- | 
:----------------------------------------------------------- |
 | `max_running_txn_num_per_db = 10000`                         | Increase the 
parameter value to adapt to high-concurrency import transactions. |
-| `streaming_label_keep_max_second = 3600``label_keep_max_second = 7200` | 
Increase the retention time to handle high-frequency import transactions with 
high memory usage. |
+| `streaming_label_keep_max_second = 3600` `label_keep_max_second = 7200` | 
Increase the retention time to handle high-frequency import transactions with 
high memory usage. |
 | `enable_round_robin_create_tablet = true`                    | When creating 
Tablets, use a Round Robin strategy to distribute evenly. |
 | `tablet_rebalancer_type = partition`                         | When 
balancing Tablets, use a strategy to evenly distribute within each partition. |
 | `enable_single_replica_load = true`                          | Enable 
single-replica import, where multiple replicas only need to build an index once 
to reduce CPU consumption. |
@@ -218,10 +218,13 @@ You can find BE configuration fields in 
`be/conf/be.conf`. Refer to the followin
 | Compaction | `max_cumu_compaction_threads = 8`                            | 
Set to CPU core count / 4, indicating that 1/4 of CPU resources are used for 
writing, 1/4 for background compaction, and 2/1 for queries and other 
operations. |
 | -          | `inverted_index_compaction_enable = true`                    | 
Enable inverted index compaction to reduce CPU consumption during compaction. |
 | -          | `enable_segcompaction = false` `enable_ordered_data_compaction 
= false` | Disable two compaction features that are unnecessary for log 
scenarios. |
+| -          | `enable_compaction_priority_scheduling = false` | Low-priority 
compaction is limited to 2 tasks on a single disk, which can affect the speed 
of compaction. |
+| -          | `total_permits_for_compaction_score = 200000 ` | The parameter 
is used to control memory, under the memory time series strategy, the parameter 
itself can control memory. |
 | Cache      | `disable_storage_page_cache = true` 
`inverted_index_searcher_cache_limit = 30%` | Due to the large volume of log 
data and limited caching effect, switch from data caching to index caching. |
 | -          | `inverted_index_cache_stale_sweep_time_sec = 3600` 
`index_cache_entry_stay_time_after_lookup_s = 3600` | Maintain index caching in 
memory for up to 1 hour.           |
 | -          | `enable_inverted_index_cache_on_cooldown = true`<br 
/>`enable_write_index_searcher_cache = false` | Enable automatic caching of 
cold data storage during index uploading. |
 | -          | `tablet_schema_cache_recycle_interval = 3600` 
`segment_cache_capacity = 20000` | Reduce memory usage by other caches.         
                |
+| -          | `inverted_index_ram_dir_enable = true` | Reduce the IO overhead 
caused by writing to index files temporarily. |
 | Thread     | `pipeline_executor_size = 24` 
`doris_scanner_thread_pool_thread_num = 48` | Configure computing threads and 
I/O threads for a 32-core CPU in proportion to core count. |
 | -          | `scan_thread_nice_value = 5`                                 | 
Lower the priority of query I/O threads to ensure writing performance and 
timeliness. |
 | Other      | `string_type_length_soft_limit_bytes = 10485760`             | 
Increase the length limit of string-type data to 10 MB.      |
@@ -274,7 +277,7 @@ Configure storage policies as follows:
 
 - Configure the storage location for log_s3 and set the log_policy_3day 
policy, where the data is cooled and moved to the specified storage location of 
log_s3 after 3 days. Refer to the code below.
 
-```Go  
+```SQL
 CREATE DATABASE log_db;
 USE log_db;
 
@@ -311,6 +314,7 @@ DUPLICATE KEY(`ts`)
 PARTITION BY RANGE(`ts`) ()
 DISTRIBUTED BY RANDOM BUCKETS 250
 PROPERTIES (
+"compaction_policy" = "time_series",
 "dynamic_partition.enable" = "true",
 "dynamic_partition.create_history_partition" = "true",
 "dynamic_partition.time_unit" = "DAY",
@@ -318,11 +322,10 @@ PROPERTIES (
 "dynamic_partition.end" = "1",
 "dynamic_partition.prefix" = "p",
 "dynamic_partition.buckets" = "250",
-"dynamic_partition.replication_num" = "1", -- unneccessary for the 
compute-storage coupled mode
-"replication_num" = "1" -- unneccessary for the compute-storage coupled mode
+"dynamic_partition.replication_num" = "2", -- unneccessary for the 
compute-storage coupled mode
+"replication_num" = "2" -- unneccessary for the compute-storage coupled mode
 "enable_single_replica_compaction" = "true", -- unneccessary for the 
compute-storage coupled mode
-"storage_policy" = "log_policy_3day", -- unneccessary for the compute-storage 
coupled mode
-"compaction_policy" = "time_series"
+"storage_policy" = "log_policy_3day" -- unneccessary for the compute-storage 
coupled mode
 );
 ```
 
@@ -348,53 +351,56 @@ Follow these steps:
 
 2. Configure Logstash. Specify the following fields:
 
-    - `logstash.yml`: Used to configure Logstash batch processing log sizes 
and timings for improved data writing performance.
+- `logstash.yml`: Used to configure Logstash batch processing log sizes and 
timings for improved data writing performance.
 
-    ```Plain Text  
-    pipeline.batch.size: 1000000  
-    pipeline.batch.delay: 10000
-    ```
+```Plain Text  
+pipeline.batch.size: 1000000  
+pipeline.batch.delay: 10000
+```
 
-    - `logstash_demo.conf`: Used to configure the specific input path of the 
collected logs and the settings for output to Apache Doris.
+- `logstash_demo.conf`: Used to configure the specific input path of the 
collected logs and the settings for output to Apache Doris.
 
-    ```markdown  
-    input {  
+```  
+input {  
     file {  
     path => "/path/to/your/log"  
-    }  
-    }  
-    <br />output {  
-    doris {  
-    http_hosts => \[ "<http://fehost1:http_port>", 
"<http://fehost2:http_port>", "<http://fehost3:http_port"\>]  
+  }  
+}  
+
+output {  
+  doris {  
+    http_hosts => [ "<http://fehost1:http_port>", 
"<http://fehost2:http_port>", "<http://fehost3:http_port";>]  
     user => "your_username"  
     password => "your_password"  
     db => "your_db"  
     table => "your_table"  
-    \# doris stream load http headers  
+    
+    # doris stream load http headers  
     headers => {  
     "format" => "json"  
     "read_json_by_line" => "true"  
     "load_to_single_tablet" => "true"  
     }  
-    \# field mapping: doris fileld name => logstash field name  
-    \# %{} to get a logstash field, \[\] for nested field such as 
\[host\]\[name\] for host.name  
+    
+    # field mapping: doris fileld name => logstash field name  
+    # %{} to get a logstash field, [] for nested field such as [host][name] 
for host.name  
     mapping => {  
     "ts" => "%{@timestamp}"  
-    "host" => "%{\[host\]\[name\]}"  
-    "path" => "%{\[log\]\[file\]\[path\]}"  
+    "host" => "%{[host][name]}"  
+    "path" => "%{[log][file][path]}"  
     "message" => "%{message}"  
     }  
     log_request => true  
     log_speed_interval => 10  
-    }  
-    }
+  }  
+}
     ```
 
 3. Run Logstash according to the command below, collect logs, and output to 
Apache Doris.
 
-    ```Bash  
-    ./bin/logstash -f logstash_demo.conf
-    ```
+```Bash  
+./bin/logstash -f logstash_demo.conf
+```
 
 For more information about the Logstash Doris Output plugin, see [Logstash 
Doris Output Plugin](../ecosystem/logstash.md).
 
@@ -406,56 +412,56 @@ Follow these steps:
 
 2. Configure Filebeat. Specify the filebeat_demo.yml field that is used to 
configure the specific input path of the collected logs and the settings for 
output to Apache Doris.
 
-    ```YAML  
-    # input
-    filebeat.inputs:
-    - type: log
-    enabled: true
-    paths:
-        - /path/to/your/log
-    multiline:
-        type: pattern
-        pattern: '^[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}'
-        negate: true
-        match: after
-        skip_newline: true
-
-    processors:
-    - script:
-        lang: javascript
-        source: >
-            function process(event) {
-                var msg = event.Get("message");
-                msg = msg.replace(/\t/g, "  ");
-                event.Put("message", msg);
-            }
-    - dissect:
-        # 2024-06-08 18:26:25,481 INFO (report-thread|199) 
[ReportHandler.cpuReport():617] begin to handle
-        tokenizer: "%{day} %{time} %{log_level} (%{thread}) [%{position}] 
%{content}"
-        target_prefix: ""
-        ignore_failure: true
-        overwrite_keys: true
-
-    # queue and batch
-    queue.mem:
-    events: 1000000
-    flush.min_events: 100000
-    flush.timeout: 10s
-
-    # output
-    output.doris:
-    fenodes: [ "http://fehost1:http_port";, "http://fehost2:http_port";, 
"http://fehost3:http_port"; ]
-    user: "your_username"
-    password: "your_password"
-    database: "your_db"
-    table: "your_table"
-    # output string format
-    codec_format_string: '{"ts": "%{[day]} %{[time]}", "host": 
"%{[agent][hostname]}", "path": "%{[log][file][path]}", "message": 
"%{[message]}"}'
-    headers:
-        format: "json"
-        read_json_by_line: "true"
-        load_to_single_tablet: "true"
-    ```
+```YAML  
+# input
+filebeat.inputs:
+- type: log
+enabled: true
+paths:
+    - /path/to/your/log
+multiline:
+    type: pattern
+    pattern: '^[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}'
+    negate: true
+    match: after
+    skip_newline: true
+
+processors:
+- script:
+    lang: javascript
+    source: >
+        function process(event) {
+            var msg = event.Get("message");
+            msg = msg.replace(/\t/g, "  ");
+            event.Put("message", msg);
+        }
+- dissect:
+    # 2024-06-08 18:26:25,481 INFO (report-thread|199) 
[ReportHandler.cpuReport():617] begin to handle
+    tokenizer: "%{day} %{time} %{log_level} (%{thread}) [%{position}] 
%{content}"
+    target_prefix: ""
+    ignore_failure: true
+    overwrite_keys: true
+
+# queue and batch
+queue.mem:
+events: 1000000
+flush.min_events: 100000
+flush.timeout: 10s
+
+# output
+output.doris:
+fenodes: [ "http://fehost1:http_port";, "http://fehost2:http_port";, 
"http://fehost3:http_port"; ]
+user: "your_username"
+password: "your_password"
+database: "your_db"
+table: "your_table"
+# output string format
+codec_format_string: '{"ts": "%{[day]} %{[time]}", "host": 
"%{[agent][hostname]}", "path": "%{[log][file][path]}", "message": 
"%{[message]}"}'
+headers:
+    format: "json"
+    read_json_by_line: "true"
+    load_to_single_tablet: "true"
+```
 
 3. Run Filebeat according to the command below, collect logs, and output to 
Apache Doris.
 
@@ -470,7 +476,7 @@ For more information about Filebeat, refer to [Beats Doris 
Output Plugin](../eco
 
 Write JSON formatted logs to Kafka's message queue, create a Kafka Routine 
Load, and allow Apache Doris to actively pull data from Kafka.
 
-You can refer to the example below, where `property.\*` represents Librdkafka 
client-related configurations and needs to be adjusted according to the actual 
Kafka cluster situation.
+You can refer to the example below, where `property.*` represents Librdkafka 
client-related configurations and needs to be adjusted according to the actual 
Kafka cluster situation.
 
 ```SQL  
 CREATE ROUTINE LOAD load_log_kafka ON log_db.log_table  
@@ -486,7 +492,7 @@ PROPERTIES (
 )  
 FROM KAFKA (  
 "kafka_broker_list" = "host:port",  
-"kafka_topic" = "log_\_topic_",  
+"kafka_topic" = "log__topic_",  
 "property.group.id" = "your_group_id",  
 "property.security.protocol"="SASL_PLAINTEXT",  
 "property.sasl.mechanism"="GSSAPI",  
@@ -504,15 +510,15 @@ For more information about Kafka, see [Routine 
Load](../data-operate/import/rout
 In addition to integrating common log collectors, you can also customize 
programs to import log data into Apache Doris using the Stream Load HTTP API. 
Refer to the following code:
 
 ```Bash  
-curl \\  
-\--location-trusted \\  
-\-u username:password \\  
-\-H "format:json" \\  
-\-H "read_json_by_line:true" \\  
-\-H "load_to_single_tablet:true" \\  
-\-H "timeout:600" \\  
-\-T logfile.json \\  
-http://fe_host:fe_http_port/api/log_db/log_table/\_stream_load
+curl   
+--location-trusted   
+-u username:password   
+-H "format:json"   
+-H "read_json_by_line:true"   
+-H "load_to_single_tablet:true"   
+-H "timeout:600"   
+-T logfile.json   
+http://fe_host:fe_http_port/api/log_db/log_table/_stream_load
 ```
 
 When using custom programs, pay attention to the following key points:
@@ -542,33 +548,33 @@ Here are 5 common SQL query commands for reference:
 - View the latest 10 log entries
 
 ```SQL  
-SELECT \* FROM your_table_name ORDER BY ts DESC LIMIT 10;
+SELECT * FROM your_table_name ORDER BY ts DESC LIMIT 10;
 ```
 
 - Query the latest 10 log entries with the host as 8.8.8.8
 
 ```SQL  
-SELECT \* FROM your_table_name WHERE host = '8.8.8.8' ORDER BY ts DESC LIMIT 
10;
+SELECT * FROM your_table_name WHERE host = '8.8.8.8' ORDER BY ts DESC LIMIT 10;
 ```
 
 - Retrieve the latest 10 log entries with error or 404 in the request field. 
In the command below, MATCH_ANY is a full-text search SQL syntax used by Apache 
Doris for matching any keyword in the fields.
 
 ```SQL  
-SELECT \* FROM your_table_name WHERE message **MATCH_ANY** 'error 404'  
+SELECT * FROM your_table_name WHERE message **MATCH_ANY** 'error 404'  
 ORDER BY ts DESC LIMIT 10;
 ```
 
 - Retrieve the latest 10 log entries with image and faq in the request field. 
In the command below, MATCH_ALL is a full-text search SQL syntax used by Apache 
Doris for matching all keywords in the fields.
 
 ```SQL  
-SELECT \* FROM your_table_name WHERE message **MATCH_ALL** 'image faq'  
+SELECT * FROM your_table_name WHERE message **MATCH_ALL** 'image faq'  
 ORDER BY ts DESC LIMIT 10;
 ```
 
 - Retrieve the latest 10 entries with image and faq in the request field. In 
the following command, MATCH_PHRASE is a full-text search SQL syntax used by 
Apache Doris for matching all keywords in the fields and requiring consistent 
order. In the example below, a image faq b can match, but a faq image b cannot 
match because the order of image and faq does not match the syntax.
 
 ```SQL  
-SELECT \* FROM your_table_name WHERE message **MATCH_PHRASE** 'image faq'  
+SELECT * FROM your_table_name WHERE message **MATCH_PHRASE** 'image faq'  
 ORDER BY ts DESC LIMIT 10;
 ```
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

(doris-website) branch master updated: update log storage docs (#1004)

Reply via email to