yiguolei commented on code in PR #3453:
URL: https://github.com/apache/doris-website/pull/3453#discussion_r2922644812


##########
i18n/zh-CN/docusaurus-plugin-content-docs/current/ai/vector-search/performance-large-scale.md:
##########
@@ -0,0 +1,171 @@
+---
+{
+    "title": "大规模性能实测",
+    "language": "zh-CN",
+    "description": "总结 Doris ANN Index 在单机与分布式环境下的大规模导入与查询性能测试结果。"
+}
+---
+
+<!-- 
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+本文总结了在单机和分布式两种部署下的大规模测试结果。测试的目的,是展示 Doris 
在不同数据规模下的查询表现,以及在数据规模持续增长时,如何将向量查询能力从单机扩展到分布式部署。
+
+## 测试矩阵
+
+- 单机:FE / BE 分离部署,BE 使用 1 台 16C64GB 机器。
+- 分布式:3 台 BE,每台 16C64GB。
+- 测试数据集:
+  - Performance768D10M
+  - Performance1536D5M
+  - Performance768D100M
+
+## 单机实测(16C64GB)
+
+单机结果给出了中大规模数据集上的 ANN 查询性能基线。
+
+### 导入性能
+
+| 项目 | Performance768D10M | Performance1536D5M |
+|------|---------------------|--------------------|
+| 向量维度 | 768 | 1536 |
+| metric_type | inner_product | inner_product |
+| 数据量 | 10M 行 | 5M 行 |
+| 导入 batch 参数 | `NUM_PER_BATCH=500000`<br/>`--stream-load-rows-per-batch 
500000` | `NUM_PER_BATCH=250000`<br/>`--stream-load-rows-per-batch 250000` |
+| 导入耗时 | 76m41s | 41m |
+| `show data all` | 56.498 GB (25.354 GB + 31.145 GB) | 55.223 GB (25.346 GB + 
29.878 GB) |
+
+Performance768D10M 导入过程中的 CPU 监控如下。可以看到,导入期间 CPU 使用率整体较为平稳。
+
+<img src="/images/vector-search/Performance768D-CPU-Import.png" 
alt="Performance768D10M import CPU" width="900" height="435" />
+
+Performance1536D5M 的数据量较小,导入时的 batch size 也更小,因此导入阶段的 CPU 使用率波动更为频繁。
+
+<img src="/images/vector-search/Performance1536D5M-CPU-Import.png" 
alt="Performance1536D5M import CPU" width="900" height="432" />
+
+### 查询性能
+
+从两个单机 workload 的结果可以看到,Doris 在保持较高召回率的同时,能够达到数百 QPS,并维持较低查询延迟。
+
+#### 汇总
+
+| 数据集 | BestQPS | Recall@100 |
+|--------|---------|------------|
+| Performance768D10M | 481.9356 | 0.9207 |

Review Comment:
   我们的数据集是固定的,行数是不一样的,所以这里我们只列行数



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to