[doris] branch master updated: [doc](performance) performance doc and script update (#12493)

yiguolei Thu, 08 Sep 2022 18:10:01 -0700

This is an automated email from the ASF dual-hosted git repository.

yiguolei pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git



The following commit(s) were added to refs/heads/master by this push:
     new 3cc06820c4 [doc](performance) performance doc and script update  
(#12493)
3cc06820c4 is described below

commit 3cc06820c4d6187b61154341fa27c59c9708d930
Author: Dongyang Li <hello_step...@qq.com>
AuthorDate: Fri Sep 9 09:09:49 2022 +0800

    [doc](performance) performance doc and script update  (#12493)
---
 docs/en/docs/benchmark/ssb.md               | 88 +++++++++++------------------
 docs/en/docs/benchmark/tpch.md              | 26 ++++-----
 docs/zh-CN/docs/benchmark/ssb.md            | 80 ++++++++++----------------
 docs/zh-CN/docs/benchmark/tpch.md           | 30 ++++------
 tools/ssb-tools/bin/gen-ssb-data.sh         |  2 +
 tools/ssb-tools/bin/run-ssb-flat-queries.sh | 48 +++++++++++++---
 tools/ssb-tools/bin/run-ssb-queries.sh      | 63 +++++++++++++++++----
 tools/tpch-tools/bin/run-tpch-queries.sh    | 10 ++--
 8 files changed, 189 insertions(+), 158 deletions(-)

diff --git a/docs/en/docs/benchmark/ssb.md b/docs/en/docs/benchmark/ssb.md
index 0f3e3abce9..faae0f3257 100644
--- a/docs/en/docs/benchmark/ssb.md
+++ b/docs/en/docs/benchmark/ssb.md
@@ -117,30 +117,30 @@ The scripts covered in the following documents are stored 
in `tools/ssb-tools/`
 Execute the following script to download and compile the 
[ssb-dbgen](https://github.com/electrum/ssb-dbgen.git) tool.
 
 ```shell
-sh build-ssb-dbgen.sh
+bash bin/build-ssb-dbgen.sh
 ````
 
-After successful installation, the `dbgen` binary will be generated in the 
`ssb-dbgen/` directory.
+After successful installation, the `dbgen` binary will be generated in the 
`bin/ssb-dbgen/` directory.
 
 ### 6.2 Generate SSB test set
 
 Execute the following script to generate the SSB dataset:
 
 ```shell
-sh gen-ssb-data.sh -s 100 -c 100
+bash bin/gen-ssb-data.sh
 ````
 
-> Note 1: See script help with `sh gen-ssb-data.sh -h`.
+> Note 1: See script help with `bash gen-ssb-data.sh -h`.The default scale 
factor is 100 (referred to as sf100 for short). By default, it takes 6 minutes 
to generate 10 data files, namely `bash bin/gen-ssb-data.sh -s 100 -c 10`.
 >
-> Note 2: The data will be generated in the `ssb-data/` directory with the 
suffix `.tbl`. The total file size is about 60GB. The generation time may vary 
from a few minutes to an hour.
+> Note 2: The data will be generated in the directory `bin/ssb-data/` with the 
suffix`. tbl`. The total file size is about 60GB. The generation time may vary 
from several minutes to one hour, and the information of the generated files 
will be listed after the generation is completed.
 >
-> Note 3: `-s 100` indicates that the test set size factor is 100, `-c 100` 
indicates that 100 concurrent threads generate data for the lineorder table. 
The `-c` parameter also determines the number of files in the final lineorder 
table. The larger the parameter, the larger the number of files and the smaller 
each file.
+> Note 3: `-s 100` indicates that the test set scale factor is 100, `-c 10` 
indicates that 10 concurrent threads generate data for the lineorder table. The 
`-c` parameter also determines the number of files in the final lineorder 
table. The larger the parameter, the larger the number of files and the smaller 
each file. Use the default parameters to test sf100, and `-s1000 -c100` to test 
sf1000.
 
 With the `-s 100` parameter, the resulting dataset size is:
 
 | Table     | Rows             | Size | File Number |
 | --------- | ---------------- | ---- | ----------- |
-| lineorder | 6亿（600037902） | 60GB | 100         |
+| lineorder | 6亿（600037902） | 60GB | 10         |
 | customer  | 300万（3000000） | 277M | 1           |
 | part      | 140万（1400000） | 116M | 1           |
 | supplier  | 20万（200000）   | 17M  | 1           |
@@ -148,40 +148,31 @@ With the `-s 100` parameter, the resulting dataset size 
is:
 
 ### 6.3 Create table
 
-#### 6.3.1 Prepare the `doris-cluster.conf` file.
+#### 6.3.1 Prepare the `conf/doris-cluster.conf` file.
 
-Before calling the import script, you need to write the FE's ip port and other 
information in the `doris-cluster.conf` file.
-
-File location and `load-ssb-dimension-data.sh` level.
+Before calling the import script, you need to write the FE's ip port and other 
information in the `conf/doris-cluster.conf` file.
 
 The contents of the file include FE's ip, HTTP port, user name, password and 
the DB name of the data to be imported:
 
 ```shell
-export FE_HOST="xxx"
+export FE_HOST="127.0.0.1"
 export FE_HTTP_PORT="8030"
 export FE_QUERY_PORT="9030"
 export USER="root"
-export PASSWORD='xxx'
+export PASSWORD=""
 export DB="ssb"
 ````
 
 #### 6.3.2 Execute the following script to generate and create the SSB table:
 
 ```shell
-sh create-ssb-tables.sh
+bash bin/create-ssb-tables.sh
 ````
 
 Or copy the build table in 
[create-ssb-tables.sql](https://github.com/apache/incubator-doris/tree/master/tools/ssb-tools/ddl/create-ssb-tables.sql)
 Statement, executed in Doris.
+copy 
[create-ssb-flat-table.sql](https://github.com/apache/incubator-doris/tree/master/tools/ssb-tools/ddl/create-ssb-flat-table.sql)
 The table building statement in , executed in Doris.
 
-#### 6.3.3 Execute the following script to generate and create an SSB flat 
table:
-
-```shell
-sh create-ssb-flat-table.sh
-````
-
-Or copy 
[create-ssb-flat-table.sql](https://github.com/apache/incubator-doris/tree/master/tools/ssb-tools/ddl/create-ssb-flat-table.sql)
 The table building statement in , executed in Doris.
-
-Below is the `lineorder_flat` table building statement. The "lineorder_flat" 
table is created in the above `create-ssb-flat-table.sh` script with the 
default number of buckets (48 buckets). You can delete this table and adjust 
the number of buckets according to your cluster size node configuration, so as 
to obtain a better test effect.
+Below is the `lineorder_flat` table building statement. The "lineorder_flat" 
table is created in the above `bin/create-ssb-flat-table.sh` script with the 
default number of buckets (48 buckets). You can delete this table and adjust 
the number of buckets according to your cluster size node configuration, so as 
to obtain a better test effect.
 
 ```sql
 CREATE TABLE `lineorder_flat` (
@@ -243,38 +234,19 @@ PROPERTIES (
 );
 ```
 
-> ### 6.4 Import data
->
-> #### 6.4.1 Import 4 dimension table data
->
-> Because the data volume of these four dimension tables (customer, part, 
supplier and date) is small, the import is relatively simple. We use the 
following command to import the data of these four tables first:
->
-> ```shell
-> sh load-ssb-dimension-data.sh
-> ````
->
-> #### 6.4.2 Import fact table lineorder.
->
-> Import the lineorder table data by the following command
->
-> ````shell
-> sh load-ssb-fact-data.sh -c 5
-> ````
->
-> `-c 5` means start 5 concurrent thread imports (default is 3). In the case 
of a single BE node, the import time of the lineorder data generated by `sh 
gen-ssb-data.sh -s 100 -c 100` using `sh load-ssb-fact-data.sh -c 3` is about 
10min. Memory overhead is about 5-6GB. If you start more threads, you can speed 
up the import, but it will add additional memory overhead.
->
-> > Note: For faster import speed, you can restart BE after adding 
`flush_thread_num_per_store=5` in be.conf. This configuration indicates the 
number of disk write threads for each data directory, and the default is 2. 
Larger data can improve write data throughput, but may increase IO Util. 
(Reference value: 1 mechanical disk, when the default is 2, the IO Util during 
the import process is about 12%, and when it is set to 5, the IO Util is about 
26%. If it is an SSD disk, it is almost 0) .
->
-> #### 6.4.3 Import flat table
->
-> Import the lineorder_flat table data with the following command:
+### 6.4 Import data
+
+The following script will connects Doirs to import according to the parameters 
in ` conf/Doris-cluster.conf`, including imports four dimension tables 
(customer, part, supplier and date) which has a small amount of data in single 
thread, simultaneously imports one fact table (lineorder), and imports a wide 
table (lineorder_flat) by' INSERT INTO ... SELECT ...'.
+
+```shell
+bash bin/load-ssb-data.sh
+````
+
+> Note 1: Check the script help through `bash bin/load-ssb-data.sh-h`, and by 
default, it will start 5 threads to import lineorder concurrently, that is `-c 
5`. If more threads are started, the import speed can be accelerated, but 
additional memory overhead will be added.
 >
-> ```shell
-> sh load-ssb-flat-data.sh
-> ````
+> Note 2: For faster import speed, you can restart BE after adding 
`flush_thread_num_per_store=5` in be.conf. This configuration indicates the 
number of disk write threads for each data directory, and the default is 2. 
Larger data can improve write data throughput, but may increase IO Util. 
(Reference value: 1 mechanical disk, when the default is 2, the IO Util during 
the import process is about 12%, and when it is set to 5, the IO Util is about 
26%. If it is an SSD disk, it is almost 0) .
 >
-> > Note: Flat table data is imported in the way of 'INSERT INTO ... SELECT 
... '.
-
+> Note 3: It cost about 389s in loading customer, part, supplier, date and 
lineorder, and 740s in inserting into lineorder_flat.
 ### 6.5 Check imported data
 
 ```sql
@@ -299,7 +271,15 @@ The amount of data should be the same as the number of 
rows that generate the da
 
 ### 6.6 Query test
 
-#### 6.6.1 Test SQL
+#### 6.6.1 Test script
+
+The following script connects Doris according to the parameters in ` 
conf/Doris-cluster.conf`, and prints out the rows of each table before 
executing the query.
+
+```shell
+bash bin/run-ssb-flat-queries.sh
+```
+
+#### 6.6.2 Test SQL
 
 ```sql
 --Q1.1
diff --git a/docs/en/docs/benchmark/tpch.md b/docs/en/docs/benchmark/tpch.md
index 2fcd7d4094..d354730aac 100644
--- a/docs/en/docs/benchmark/tpch.md
+++ b/docs/en/docs/benchmark/tpch.md
@@ -73,7 +73,7 @@ The entire test simulation generates 100G of data and is 
imported into Doris 0.1
 
 ## 4. Test SQL
 
-TPCH 22 test query statements ： 
[TPCH-Query-SQL](https://github.com/apache/incubator-doris/tree/master/tools/tpch-tools/queries)
+TPCH 22 test query statements ： 
[TPCH-Query-SQL](https://github.com/apache/doris/tree/master/tools/tpch-tools/queries)
 
 Notice:
 
@@ -131,10 +131,10 @@ Please refer to the [official 
document](../install/install-deploy.md) to install
 
 ### 7.1 Download and install the TPC-H data generation tool
 
-Execute the following script to download and compile the 
[tpch-tools](https://github.com/apache/incubator-doris/tree/master/tools/tpch-tools)
 tool.
+Execute the following script to download and compile the 
[tpch-tools](https://github.com/apache/doris/tree/master/tools/tpch-tools) tool.
 
 ```shell
-sh build-tpch-dbgen.sh
+sh bin/build-tpch-dbgen.sh
 ```
 
 After successful installation, the `dbgen` binary will be generated in the 
`TPC-H_Tools_v3.0.0/` directory.
@@ -144,12 +144,12 @@ After successful installation, the `dbgen` binary will be 
generated in the `TPC-
 Execute the following script to generate the TPC-H dataset:
 
 ```shell
-sh gen-tpch-data.sh
+sh bin/gen-tpch-data.sh
 ```
 
-> Note 1: View script help via `sh gen-tpch-data.sh -h`.
+> Note 1: View script help via `sh bin/gen-tpch-data.sh -h`.
 >
-> Note 2: The data will be generated in the `tpch-data/` directory with the 
suffix `.tbl`. The total file size is about 100GB. The generation time may vary 
from a few minutes to an hour.
+> Note 2: The data will be generated in the `bin/tpch-data/` directory with 
the suffix `.tbl`. The total file size is about 100GB. The generation time may 
vary from a few minutes to an hour.
 >
 > Note 3: The standard test data set of 100G is generated by default
 
@@ -157,7 +157,7 @@ sh gen-tpch-data.sh
 
 #### 7.3.1 Prepare the `doris-cluster.conf` file
 
-Before calling the import script, you need to write the FE's ip port and other 
information in the `doris-cluster.conf` file.
+Before calling the import script, you need to write the FE's ip port and other 
information in the `conf/doris-cluster.conf` file.
 
 File location and `load-tpch-data.sh` level.
 
@@ -175,7 +175,7 @@ export USER='root'
 # Doris password
 export PASSWORD=''
 # The database where TPC-H tables located
-export DB='tpch1'
+export DB='tpch'
 ```
 
 #### 7.3.2 Execute the following script to generate and create the TPC-H table
@@ -183,7 +183,7 @@ export DB='tpch1'
 ```shell
 sh create-tpch-tables.sh
 ```
-Or copy the table creation statement in 
[create-tpch-tables.sql](https://github.com/apache/incubator-doris/blob/master/tools/tpch-tools/create-tpch-tables.sql),
 Execute in Doris.
+Or copy the table creation statement in 
[create-tpch-tables.sql](https://github.com/apache/doris/blob/master/tools/tpch-tools/ddl/create-tpch-tables.sql),
 Execute in Doris.
 
 
 ### 7.4 导入数据
@@ -191,7 +191,7 @@ Or copy the table creation statement in 
[create-tpch-tables.sql](https://github.
 通过下面的命令执行数据导入：
 
 ```shell
-sh ./load-tpch-data.sh
+sh bin/load-tpch-data.sh
 ```
 
 ### 7.5 Check Imported Data
@@ -215,7 +215,7 @@ select count(*)  from  revenue0;
 Execute the above test SQL or execute the following command
 
 ```
-./run-tpch-queries.sh
+sh bin/run-tpch-queries.sh
 ```
 
 >Notice:
@@ -223,7 +223,3 @@ Execute the above test SQL or execute the following command
 >1. At present, the query optimizer and statistics functions of Doris are not 
 >perfect, so we rewrite some queries in TPC-H to adapt to the execution 
 >framework of Doris, but it does not affect the correctness of the results
 >
 >2. Doris' new query optimizer will be released in subsequent versions
->3. Set `set mem_exec_limit=8G` before executing the query
-
-
-
diff --git a/docs/zh-CN/docs/benchmark/ssb.md b/docs/zh-CN/docs/benchmark/ssb.md
index 14ede8a6c9..3164e6fe08 100644
--- a/docs/zh-CN/docs/benchmark/ssb.md
+++ b/docs/zh-CN/docs/benchmark/ssb.md
@@ -117,30 +117,30 @@ enable_low_cardinality_optimize=true
 执行以下脚本下载并编译 [ssb-dbgen](https://github.com/electrum/ssb-dbgen.git) 工具。
 
 ```shell
-sh build-ssb-dbgen.sh
+bash bin/build-ssb-dbgen.sh
 ```
 
-安装成功后，将在 `ssb-dbgen/` 目录下生成 `dbgen` 二进制文件。
+安装成功后，将在 `bin/ssb-dbgen/` 目录下生成 `dbgen` 二进制文件。
 
 ### 6.2 生成 SSB 测试集
 
 执行以下脚本生成 SSB 数据集：
 
 ```shell
-sh gen-ssb-data.sh -s 100 -c 100
+bash bin/gen-ssb-data.sh
 ```
 
-> 注1：通过 `sh gen-ssb-data.sh -h` 查看脚本帮助。
+> 注1：通过 `bash bin/gen-ssb-data.sh -h` 查看脚本帮助，默认 scale factor 为 
100（简称sf100），默认生成 10 个数据文件，即 `bash bin/gen-ssb-data.sh -s 100 -c 10`，耗时数分钟。
 >
-> 注2：数据会以 `.tbl` 为后缀生成在  `ssb-data/` 目录下。文件总大小约60GB。生成时间可能在数分钟到1小时不等。
+> 注2：数据会以 `.tbl` 为后缀生成在  `bin/ssb-data/` 
目录下。文件总大小约60GB。生成时间可能在数分钟到1小时不等，生成完成后会列出生成文件的信息。
 >
-> 注3：`-s 100` 表示测试集大小系数为 100，`-c 100` 表示并发100个线程生成 lineorder 表的数据。`-c` 
参数也决定了最终 lineorder 表的文件数量。参数越大，文件数越多，每个文件越小。
+> 注3：`-s 100` 表示测试集大小系数为 100，`-c 10` 表示并发10个线程生成 lineorder 表的数据。`-c` 参数也决定了最终 
lineorder 表的文件数量。参数越大，文件数越多，每个文件越小。测试sf100用默认参数即可，测试sf1000用 `-s 1000 -c 100` 。
 
 在 `-s 100` 参数下，生成的数据集大小为：
 
 | Table     | Rows             | Size | File Number |
 | --------- | ---------------- | ---- | ----------- |
-| lineorder | 6亿（600037902） | 60GB | 100         |
+| lineorder | 6亿（600037902） | 60GB | 10          |
 | customer  | 300万（3000000） | 277M | 1           |
 | part      | 140万（1400000） | 116M | 1           |
 | supplier  | 20万（200000）   | 17M  | 1           |
@@ -148,38 +148,30 @@ sh gen-ssb-data.sh -s 100 -c 100
 
 ### 6.3 建表
 
-#### 6.3.1 准备 `doris-cluster.conf` 文件。
+#### 6.3.1 准备 `conf/doris-cluster.conf` 文件。
 
-在调用导入脚本前，需要将 FE 的 ip 端口等信息写在 `doris-cluster.conf` 文件中。
+在调用导入脚本前，需要将 FE 的 ip 端口等信息写在 `conf/doris-cluster.conf` 文件中。
 
-文件位置和 `load-ssb-dimension-data.sh` 平级。
-
-文件内容包括 FE 的 ip，HTTP 端口，用户名，密码以及待导入数据的 DB 名称：
+文件内容包括 FE 的 ip，HTTP 端口，用户名，密码（默认为空）以及待导入数据的 DB 名称：
 
 ```shell
-export FE_HOST="xxx"
+export FE_HOST="127.0.0.1"
 export FE_HTTP_PORT="8030"
 export FE_QUERY_PORT="9030"
 export USER="root"
-export PASSWORD='xxx'
+export PASSWORD=""
 export DB="ssb"
 ```
 
 #### 6.3.2 执行以下脚本生成创建 SSB 表：
 
 ```shell
-sh create-ssb-tables.sh
+bash bin/create-ssb-tables.sh
 ```
 或者复制 
[create-ssb-tables.sql](https://github.com/apache/incubator-doris/tree/master/tools/ssb-tools/ddl/create-ssb-tables.sql)
 中的建表语句，在 Doris 中执行。
+复制 
[create-ssb-flat-table.sql](https://github.com/apache/incubator-doris/tree/master/tools/ssb-tools/ddl/create-ssb-flat-table.sql)
 中的建表语句，在 Doris 中执行。
 
-#### 6.3.3 执行以下脚本生成创建 SSB flat 表：
-
-```shell
-sh create-ssb-flat-table.sh
-```
-或者复制 
[create-ssb-flat-table.sql](https://github.com/apache/incubator-doris/tree/master/tools/ssb-tools/ddl/create-ssb-flat-table.sql)
 中的建表语句，在 Doris 中执行。
-
-下面是 `lineorder_flat` 表建表语句。在上面的 `create-ssb-flat-table.sh`  
脚本中创建"lineorder_flat"表，并进行了默认分桶数（48个桶)。您可以删除该表，根据您的集群规模节点配置对这个分桶数进行调整，这样可以获取到更好的一个测试效果。
+下面是 `lineorder_flat` 表建表语句。在上面的 `bin/create-ssb-table.sh`  
脚本中创建"lineorder_flat"表，并进行了默认分桶数（48个桶)。您可以删除该表，根据您的集群规模节点配置对这个分桶数进行调整，这样可以获取到更好的一个测试效果。
 
 ```sql
 CREATE TABLE `lineorder_flat` (
@@ -246,35 +238,17 @@ PROPERTIES (
 
 ### 6.4 导入数据
 
-#### 6.4.1 导入 4 张维度表数据
-
-因为这 4 张维表（customer, part, supplier and date）数据量较小，导入较简单，我们使用以下命令先导入这4表的数据：
+下面的脚本根据 `conf/doris-cluster.conf` 中的参数连接Doirs进行导入，单线程导入数据量较小的 4 张维度表（customer, 
part, supplier and date），并发导入 1 张事实表（lineorder），以及采用 'INSERT INTO ... SELECT 
... ' 的方式导入宽表（lineorder_flat）。
 
 ```shell
-sh load-ssb-dimension-data.sh
+bash bin/load-ssb-data.sh
 ```
 
-#### 6.4.2 导入事实表 lineorder。
-
-通过以下命令导入 lineorder 表数据
-
-````shell
-sh load-ssb-fact-data.sh -c 5
-````
-
-`-c 5` 表示启动 5 个并发线程导入（默认为3）。在单 BE 节点情况下，由 `sh gen-ssb-data.sh -s 100 -c 100` 
生成的 lineorder 数据，使用 `sh load-ssb-fact-data.sh -c 3` 的导入时间约为 10min。内存开销约为 
5-6GB。如果开启更多线程，可以加快导入速度，但会增加额外的内存开销。
-
-> 注：为获得更快的导入速度，你可以在 be.conf 中添加 `flush_thread_num_per_store=5` 
后重启BE。该配置表示每个数据目录的写盘线程数，默认为2。较大的数据可以提升写数据吞吐，但可能会增加 IO 
Util。（参考值：1块机械磁盘，在默认为2的情况下，导入过程中的 IO Util 约为12%，设置为5时，IO Util 约为26%。如果是 SSD 
盘，则几乎为 0）。
-
-#### 6.4.3 导入flat表
-
-通过以下命令导入 lineorder_flat 表数据：
-
-```shell
-sh load-ssb-flat-data.sh
-```
-
-> 注：flat 表数据采用 'INSERT INTO ... SELECT ... ' 的方式导入。
+> 注1：通过 `bash bin/load-ssb-data.sh -h` 查看脚本帮助, 默认 5 线程并发导入 lineorder，即 `-c 5` 
。如果开启更多线程，可以加快导入速度，但会增加额外的内存开销。
+>
+> 注2：为获得更快的导入速度，你可以在 be.conf 中添加 `flush_thread_num_per_store=5` 
后重启BE。该配置表示每个数据目录的写盘线程数，默认为2。较大的数据可以提升写数据吞吐，但可能会增加 IO 
Util。（参考值：1块机械磁盘，在默认为2的情况下，导入过程中的 IO Util 约为12%，设置为5时，IO Util 约为26%。如果是 SSD 
盘，则几乎为 0）。
+>
+> 注3：导入customer, part, supplier, date 及 lineorder 表耗时389s，打平到 lineorder_flat 
耗时740s.
 
 ### 6.5 检查导入数据
 
@@ -300,7 +274,15 @@ select count(*) from lineorder_flat;
 
 ### 6.6 查询测试
 
-#### 6.6.1 测试SQL
+#### 6.6.1 测试脚本
+
+下面脚本根据 `conf/doris-cluster.conf` 中的参数连接Doris，执行查询前会先打印出各表的数据行数。
+
+```shell
+bash bin/run-ssb-flat-queries.sh
+```
+
+#### 6.6.2 测试SQL
 
 ```sql
 --Q1.1
diff --git a/docs/zh-CN/docs/benchmark/tpch.md 
b/docs/zh-CN/docs/benchmark/tpch.md
index 2c68c86433..66499eb4a7 100644
--- a/docs/zh-CN/docs/benchmark/tpch.md
+++ b/docs/zh-CN/docs/benchmark/tpch.md
@@ -73,7 +73,7 @@ TPC-H是一个决策支持基准（Decision Support Benchmark），它由一套
 
 ## 4. 测试SQL
 
-TPCH 22个测试查询语句 ： 
[TPCH-Query-SQL](https://github.com/apache/incubator-doris/tree/master/tools/tpch-tools/queries)
+TPCH 22个测试查询语句 ： 
[TPCH-Query-SQL](https://github.com/apache/doris/tree/master/tools/tpch-tools/queries)
 
 **注意：**
 
@@ -134,10 +134,10 @@ TPCH 22个测试查询语句 ： 
[TPCH-Query-SQL](https://github.com/apache/incu
 
 ### 7.1 下载安装 TPC-H 数据生成工具
 
-执行以下脚本下载并编译  
[tpch-tools](https://github.com/apache/incubator-doris/tree/master/tools/tpch-tools)
  工具。
+执行以下脚本下载并编译  
[tpch-tools](https://github.com/apache/doris/tree/master/tools/tpch-tools)  工具。
 
 ```shell
-sh build-tpch-dbgen.sh
+sh bin/build-tpch-dbgen.sh
 ```
 
 安装成功后，将在 `TPC-H_Tools_v3.0.0/` 目录下生成 `dbgen` 二进制文件。
@@ -147,12 +147,12 @@ sh build-tpch-dbgen.sh
 执行以下脚本生成 TPC-H 数据集：
 
 ```shell
-sh gen-tpch-data.sh
+sh bin/gen-tpch-data.sh
 ```
 
-> 注1：通过 `sh gen-tpch-data.sh -h` 查看脚本帮助。
+> 注1：通过 `sh bin/gen-tpch-data.sh -h` 查看脚本帮助。
 >
-> 注2：数据会以 `.tbl` 为后缀生成在  `tpch-data/` 目录下。文件总大小约100GB。生成时间可能在数分钟到1小时不等。
+> 注2：数据会以 `.tbl` 为后缀生成在  `bin/tpch-data/` 目录下。文件总大小约100GB。生成时间可能在数分钟到1小时不等。
 >
 > 注3：默认生成 100G 的标准测试数据集
 
@@ -160,9 +160,7 @@ sh gen-tpch-data.sh
 
 #### 7.3.1 准备 `doris-cluster.conf` 文件
 
-在调用导入脚本前，需要将 FE 的 ip 端口等信息写在 `doris-cluster.conf` 文件中。
-
-文件位置和 `load-tpch-data.sh` 平级。
+在调用导入脚本前，需要将 FE 的 ip 端口等信息写在 `conf/doris-cluster.conf` 文件中。
 
 文件内容包括 FE 的 ip，HTTP 端口，用户名，密码以及待导入数据的 DB 名称：
 
@@ -178,15 +176,15 @@ export USER='root'
 # Doris password
 export PASSWORD=''
 # The database where TPC-H tables located
-export DB='tpch1'
+export DB='tpch'
 ```
 
 #### 7.3.2 执行以下脚本生成创建 TPC-H 表
 
 ```shell
-sh create-tpch-tables.sh
+sh bin/create-tpch-tables.sh
 ```
-或者复制 
[create-tpch-tables.sql](https://github.com/apache/incubator-doris/blob/master/tools/tpch-tools/create-tpch-tables.sql)
 中的建表语句，在 Doris 中执行。
+或者复制 
[create-tpch-tables.sql](https://github.com/apache/doris/blob/master/tools/tpch-tools/ddl/create-tpch-tables.sql)
 中的建表语句，在 Doris 中执行。
 
 
 ### 7.4 导入数据
@@ -194,7 +192,7 @@ sh create-tpch-tables.sh
 通过下面的命令执行数据导入：
 
 ```shell
-sh ./load-tpch-data.sh
+sh bin/load-tpch-data.sh
 ```
 
 ### 7.5 检查导入数据
@@ -218,7 +216,7 @@ select count(*)  from  revenue0;
 执行上面的测试 SQL 或者 执行下面的命令
 
 ```
-./run-tpch-queries.sh
+sh bin/run-tpch-queries.sh
 ```
 
 >注意：
@@ -226,7 +224,3 @@ select count(*)  from  revenue0;
 >1. 目前Doris的查询优化器和统计信息功能还不完善，所以我们在TPC-H中重写了一些查询以适应Doris的执行框架，但不影响结果的正确性
 >
 >2. Doris 新的查询优化器将在后续的版本中发布
->3. 执行查询之前设置 `set mem_exec_limit=8G`
-
-
-
diff --git a/tools/ssb-tools/bin/gen-ssb-data.sh 
b/tools/ssb-tools/bin/gen-ssb-data.sh
index 4f06f9ead1..871c3cf3e9 100755
--- a/tools/ssb-tools/bin/gen-ssb-data.sh
+++ b/tools/ssb-tools/bin/gen-ssb-data.sh
@@ -110,6 +110,7 @@ fi
 mkdir "${SSB_DATA_DIR}/"
 
 # gen data
+date
 cd "${SSB_DBGEN_DIR}"
 echo "Begin to generate data for table: customer"
 "${SSB_DBGEN_DIR}/dbgen" -f -s "${SCALE_FACTOR}" -T c
@@ -121,6 +122,7 @@ echo "Begin to generate data for table: date"
 "${SSB_DBGEN_DIR}/dbgen" -f -s "${SCALE_FACTOR}" -T d
 echo "Begin to generate data for table: lineorder"
 "${SSB_DBGEN_DIR}/dbgen" -f -s "${SCALE_FACTOR}" -T l -C "${PARALLEL}"
+date
 cd -
 
 # move data to $SSB_DATA_DIR
diff --git a/tools/ssb-tools/bin/run-ssb-flat-queries.sh 
b/tools/ssb-tools/bin/run-ssb-flat-queries.sh
index 74d4b71691..7a2154c62c 100755
--- a/tools/ssb-tools/bin/run-ssb-flat-queries.sh
+++ b/tools/ssb-tools/bin/run-ssb-flat-queries.sh
@@ -85,6 +85,8 @@ check_prerequest() {
 }
 
 check_prerequest "mysqlslap --version" "mysqlslap"
+check_prerequest "mysql --version" "mysql"
+check_prerequest "bc --version" "bc"
 
 source "${CURDIR}/../conf/doris-cluster.conf"
 export MYSQL_PWD=${PASSWORD}
@@ -95,25 +97,57 @@ echo "USER: ${USER}"
 echo "PASSWORD: ${PASSWORD}"
 echo "DB: ${DB}"
 
-pre_set() {
+run_sql() {
     echo "$@"
     mysql -h"${FE_HOST}" -P"${FE_QUERY_PORT}" -u"${USER}" -D"${DB}" -e "$@"
 }
 
-pre_set "set global enable_vectorized_engine=1;"
-pre_set "set global parallel_fragment_exec_instance_num=8;"
-pre_set "set global exec_mem_limit=8G;"
-pre_set "set global batch_size=4096;"
 echo '============================================'
-pre_set "show variables;"
+echo "optimize some session variables before run, and then restore it after 
run."
+origin_enable_vectorized_engine=$(
+    set -e
+    run_sql 'select @@enable_vectorized_engine;' | sed -n '3p'
+)
+origin_parallel_fragment_exec_instance_num=$(
+    set -e
+    run_sql 'select @@parallel_fragment_exec_instance_num;' | sed -n '3p'
+)
+origin_exec_mem_limit=$(
+    set -e
+    run_sql 'select @@exec_mem_limit;' | sed -n '3p'
+)
+origin_batch_size=$(
+    set -e
+    run_sql 'select @@batch_size;' | sed -n '3p'
+)
+run_sql "set global enable_vectorized_engine=1;"
+run_sql "set global parallel_fragment_exec_instance_num=8;"
+run_sql "set global exec_mem_limit=8G;"
+run_sql "set global batch_size=4096;"
+echo '============================================'
+run_sql "show variables;"
 echo '============================================'
-pre_set "show table status;"
+run_sql "show table status;"
 echo '============================================'
 
+sum=0
 for i in '1.1' '1.2' '1.3' '2.1' '2.2' '2.3' '3.1' '3.2' '3.3' '3.4' '4.1' 
'4.2' '4.3'; do
     # First run to prevent the affect of cold start
     mysql -h"${FE_HOST}" -P"${FE_QUERY_PORT}" -u"${USER}" -D "${DB}" 
<"${QUERIES_DIR}/q${i}.sql" >/dev/null 2>&1
     # Then run 3 times and takes the average time
     res=$(mysqlslap -h"${FE_HOST}" -P"${FE_QUERY_PORT}" -u"${USER}" 
--create-schema="${DB}" --query="${QUERIES_DIR}/q${i}.sql" -F '\r' -i 3 | sed 
-n '2p' | cut -d ' ' -f 9,10)
     echo "q${i}: ${res}"
+    cost=$(echo "${res}" | cut -d' ' -f1)
+    sum=$(echo "${sum} + ${cost}" | bc)
 done
+echo "total time: ${sum} seconds"
+
+echo '============================================'
+echo "restore session variables"
+run_sql "set global 
enable_vectorized_engine=${origin_enable_vectorized_engine};"
+run_sql "set global 
parallel_fragment_exec_instance_num=${origin_parallel_fragment_exec_instance_num};"
+run_sql "set global exec_mem_limit=${origin_exec_mem_limit};"
+run_sql "set global batch_size=${origin_batch_size};"
+echo '============================================'
+
+echo 'Done.'
diff --git a/tools/ssb-tools/bin/run-ssb-queries.sh 
b/tools/ssb-tools/bin/run-ssb-queries.sh
index 035a4dab11..09996aa04c 100755
--- a/tools/ssb-tools/bin/run-ssb-queries.sh
+++ b/tools/ssb-tools/bin/run-ssb-queries.sh
@@ -85,6 +85,8 @@ check_prerequest() {
 }
 
 check_prerequest "mysqlslap --version" "mysql slap"
+check_prerequest "mysql --version" "mysql"
+check_prerequest "bc --version" "bc"
 
 source "${CURDIR}/../conf/doris-cluster.conf"
 export MYSQL_PWD=${PASSWORD}
@@ -95,26 +97,67 @@ echo "USER: ${USER}"
 echo "PASSWORD: ${PASSWORD}"
 echo "DB: ${DB}"
 
-pre_set() {
+run_sql() {
     echo "$@"
     mysql -h"${FE_HOST}" -P"${FE_QUERY_PORT}" -u"${USER}" -D"${DB}" -e "$@"
 }
 
-pre_set "set global enable_vectorized_engine=1;"
-pre_set "set global parallel_fragment_exec_instance_num=8;"
-pre_set "set global exec_mem_limit=48G;"
-pre_set "set global batch_size=4096;"
-pre_set "set global enable_projection=true;"
-pre_set "set global runtime_filter_mode=global;"
-# pre_set "set global enable_cost_based_join_reorder=1"
 echo '============================================'
-pre_set "show variables;"
+echo "optimize some session variables before run, and then restore it after 
run."
+origin_enable_vectorized_engine=$(
+    set -e
+    run_sql 'select @@enable_vectorized_engine;' | sed -n '3p'
+)
+origin_parallel_fragment_exec_instance_num=$(
+    set -e
+    run_sql 'select @@parallel_fragment_exec_instance_num;' | sed -n '3p'
+)
+origin_exec_mem_limit=$(
+    set -e
+    run_sql 'select @@exec_mem_limit;' | sed -n '3p'
+)
+origin_batch_size=$(
+    set -e
+    run_sql 'select @@batch_size;' | sed -n '3p'
+)
+origin_enable_projection=$(
+    set -e
+    run_sql 'select @@enable_projection;' | sed -n '3p'
+)
+origin_runtime_filter_mode=$(
+    set -e
+    run_sql 'select @@runtime_filter_mode;' | sed -n '3p'
+)
+run_sql "set global enable_vectorized_engine=1;"
+run_sql "set global parallel_fragment_exec_instance_num=8;"
+run_sql "set global exec_mem_limit=48G;"
+run_sql "set global batch_size=4096;"
+run_sql "set global enable_projection=true;"
+run_sql "set global runtime_filter_mode=global;"
 echo '============================================'
-pre_set "show table status;"
+run_sql "show variables;"
+echo '============================================'
+run_sql "show table status;"
 echo '============================================'
 
+sum=0
 for i in '1.1' '1.2' '1.3' '2.1' '2.2' '2.3' '3.1' '3.2' '3.3' '3.4' '4.1' 
'4.2' '4.3'; do
     # Each query is executed 3 times and takes the average time
     res=$(mysqlslap -h"${FE_HOST}" -P"${FE_QUERY_PORT}" -u"${USER}" 
--create-schema="${DB}" --query="${QUERIES_DIR}/q${i}.sql" -F '\r' -i 3 | sed 
-n '2p' | cut -d ' ' -f 9,10)
     echo "q${i}: ${res}"
+    cost=$(echo "${res}" | cut -d' ' -f1)
+    sum=$(echo "${sum} + ${cost}" | bc)
 done
+echo "total time: ${sum} seconds"
+
+echo '============================================'
+echo "restore session variables"
+run_sql "set global 
enable_vectorized_engine=${origin_enable_vectorized_engine};"
+run_sql "set global 
parallel_fragment_exec_instance_num=${origin_parallel_fragment_exec_instance_num};"
+run_sql "set global exec_mem_limit=${origin_exec_mem_limit};"
+run_sql "set global batch_size=${origin_batch_size};"
+run_sql "set global enable_projection=${origin_enable_projection};"
+run_sql "set global runtime_filter_mode=${origin_runtime_filter_mode};"
+echo '============================================'
+
+echo 'Done.'
diff --git a/tools/tpch-tools/bin/run-tpch-queries.sh 
b/tools/tpch-tools/bin/run-tpch-queries.sh
index 6a05a47010..25749de5e7 100755
--- a/tools/tpch-tools/bin/run-tpch-queries.sh
+++ b/tools/tpch-tools/bin/run-tpch-queries.sh
@@ -95,15 +95,15 @@ echo "PASSWORD: ${PASSWORD}"
 echo "DB: ${DB}"
 echo "Time Unit: ms"
 
-pre_set() {
+run_sql() {
     echo "$*"
     mysql -h"${FE_HOST}" -u"${USER}" -P"${FE_QUERY_PORT}" -D"${DB}" -e "$*"
 }
 
 echo '============================================'
-pre_set "show variables;"
+run_sql "show variables;"
 echo '============================================'
-pre_set "show table status;"
+run_sql "show table status;"
 echo '============================================'
 
 sum=0
@@ -118,7 +118,7 @@ for i in $(seq 1 22); do
         total=$((total + end - start))
     done
     cost=$((total / run))
-    echo "q${i}: ${cost}"
+    echo "q${i}: ${cost} ms"
     sum=$((sum + cost))
 done
-echo "Total cost: ${sum}"
+echo "Total cost: ${sum} ms"


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

[doris] branch master updated: [doc](performance) performance doc and script update (#12493)

Reply via email to