This is an automated email from the ASF dual-hosted git repository. jiafengzheng pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push: new d6c1f958751 fix d6c1f958751 is described below commit d6c1f9587519e92f1453771aede478730113b52c Author: jiafeng.zhang <zhang...@gmail.com> AuthorDate: Tue Nov 22 08:43:40 2022 +0800 fix --- docs/benchmark/ssb.md | 175 +++++++++++++++++++++++++------------------------ docs/benchmark/tpch.md | 99 ++++++++++++++-------------- 2 files changed, 137 insertions(+), 137 deletions(-) diff --git a/docs/benchmark/ssb.md b/docs/benchmark/ssb.md index eeca264fb6c..d58db90664b 100644 --- a/docs/benchmark/ssb.md +++ b/docs/benchmark/ssb.md @@ -26,55 +26,51 @@ under the License. # Star Schema Benchmark -[Star Schema Benchmark(SSB)](https://www.cs.umb.edu/~poneil/StarSchemaB.PDF) is a performance test set in a lightweight data warehouse scenario. Based on [TPC-H](http://www.tpc.org/tpch/), SSB provides a simplified version of the star schema dataset, which is mainly used to test the performance of multi-table association queries under the star schema. . In addition, the industry usually flattens SSB as a wide table model (hereinafter referred to as: SSB flat) to test the performance of t [...] +[Star Schema Benchmark(SSB)](https://www.cs.umb.edu/~poneil/StarSchemaB.PDF) is a lightweight performance test set in the data warehouse scenario. SSB provides a simplified star schema data based on [TPC-H](http://www.tpc.org/tpch/), which is mainly used to test the performance of multi-table JOIN query under star schema. In addition, the industry usually flattens SSB into a wide table model (Referred as: SSB flat) to test the performance of the query engine, refer to [Clickhouse](https [...] This document mainly introduces the performance of Doris on the SSB 100G test set. -> Note 1: The standard test set including SSB is usually far from the actual business scenario, and some tests will perform parameter tuning for the test set. Therefore, the test results of the standard test set can only reflect the performance of the database in specific scenarios. Users are advised to conduct further testing with actual business data. +> Note 1: The standard test set including SSB usually has a large gap with the actual business scenario, and some tests will perform parameter tuning for the test set. Therefore, the test results of the standard test set can only reflect the performance of the database in a specific scenario. It is recommended that users use actual business data for further testing. > -> Note 2: The operations involved in this document are all performed in the Ubuntu Server 20.04 environment, and CentOS 7 can also be tested. +> Note 2: The operations involved in this document are all performed in the Ubuntu Server 20.04 environment, and CentOS 7 as well. -We conducted pairwise testing on 13 queries on the SSB standard test dataset based on Apache Doris 1.2.0-rc01, Apache Doris 1.1.3 and Apache Doris 0.15.0 RC04 versions. +With 13 queries on the SSB standard test data set, we conducted a comparison test based on Apache Doris 1.2.0-rc01, Apache Doris 1.1.3 and Apache Doris 0.15.0 RC04 versions. -The overall performance improvement on SSB FlAT wide tables was nearly 4x on Apache Doris 1.2.0-rc01 compared to Apache Doris 1.1.3, and nearly 10x on Apache Doris 0.15.0 RC04. +On the SSB flat wide table, the overall performance of Apache Doris 1.2.0-rc01 has been improved by nearly 4 times compared with Apache Doris 1.1.3, and nearly 10 times compared with Apache Doris 0.15.0 RC04. - - -On the standard SSB test SQL, Apache Doris 1.2.0-rc01 delivers an overall performance improvement of nearly 2X over Apache Doris 1.1.3 and nearly 31X over Apache Doris 0.15.0 RC04. - - +On the SQL test with standard SSB, the overall performance of Apache Doris 1.2.0-rc01 has been improved by nearly 2 times compared with Apache Doris 1.1.3, and nearly 31 times compared with Apache Doris 0.15.0 RC04. ## 1. Hardware Environment -| Number of machines | 4 Tencent Cloud hosts (1 FE, 3 BE) | +| Number of machines | 4 Tencent Cloud Hosts (1 FE, 3 BEs) | | ------------------ | ----------------------------------------- | -| CPU | AMD EPYC™ Milan (2.55GHz/3.5GHz) 16 cores | +| CPU | AMD EPYC™ Milan (2.55GHz/3.5GHz) 16 Cores | | Memory | 64G | | Network Bandwidth | 7Gbps | -| Disk | High-performance cloud disk | +| Disk | High-performance Cloud Disk | ## 2. Software Environment -- Doris deploys 3BE 1FE; +- Doris deployed 3BEs and 1FE; - Kernel version: Linux version 5.4.0-96-generic (buildd@lgw01-amd64-051) -- OS version: Ubuntu Server 20.04 LTS 64 bit -- Doris software version: Apache Doris 1.2.0-rc01, Apache Doris 1.1.3 , Apache Doris 0.15.0 RC04 +- OS version: Ubuntu Server 20.04 LTS 64-bit +- Doris software versions: Apache Doris 1.2.0-rc01, Apache Doris 1.1.3 and Apache Doris 0.15.0 RC04 - JDK: openjdk version "11.0.14" 2022-01-18 -## 3. Test data volume +## 3. Test Data Volume -| SSB table name | number of rows | remarks | +| SSB Table Name | Rows | Annotation | | :------------- | :------------- | :------------------------------- | -| lineorder | 600,037,902 | Commodity order list | -| customer | 3,000,000 | Customer Information Sheet | -| part | 1,400,000 | Parts Information Sheet | -| supplier | 200,000 | Supplier Information Sheet | -| date | 2,556 | Date table | -| lineorder_flat | 600,037,902 | Wide table after data flattening | +| lineorder | 600,037,902 | Commodity Order Details | +| customer | 3,000,000 | Customer Information | +| part | 1,400,000 | Parts Information | +| supplier | 200,000 | Supplier Information | +| date | 2,556 | Date | +| lineorder_flat | 600,037,902 | Wide Table after Data Flattening | ## 4. Test Results -Here we use Apache Doris 1.2.0-rc01, Apache Doris 1.1.3 and Apache Doris 0.15.0 RC04 versions for comparative testing, with the following results. +We use Apache Doris 1.2.0-rc01, Apache Doris 1.1.3 and Apache Doris 0.15.0 RC04 for comparative testing. The test results are as follows: | Query | Apache Doris 1.2.0-rc01(ms) | Apache Doris 1.1.3 (ms) | Doris 0.15.0 RC04 (ms) | | ----- | ------------- | ------------- | ----------------- | @@ -82,60 +78,64 @@ Here we use Apache Doris 1.2.0-rc01, Apache Doris 1.1.3 and Apache Doris 0.15.0 | Q1.2 | 10 | 10 | 30 | | Q1.3 | 30 | 70 | 120 | | Q2.1 | 90 | 360 | 900 | -| Q2.2 | 90 | 340 | 1020 | +| Q2.2 | 90 | 340 | 1,020 | | Q2.3 | 60 | 260 | 770 | -| Q3.1 | 160 | 550 | 1710 | +| Q3.1 | 160 | 550 | 1,710 | | Q3.2 | 80 | 290 | 670 | | Q3.3 | 90 | 240 | 550 | | Q3.4 | 20 | 20 | 30 | -| Q4.1 | 140 | 480 | 1250 | +| Q4.1 | 140 | 480 | 1,250 | | Q4.2 | 50 | 240 | 400 | | Q4.3 | 30 | 200 | 330 | -| Total | 880 | 3150 | 8030 | +| Total | 880 | 3,150 | 8,030 | -**Interpretation of results** + + +**Interpretation of Results** - The data set corresponding to the test results is scale 100, about 600 million. -- The test environment is configured to be commonly used by users, including 4 cloud servers, 16-core 64G SSD, and 1 FE and 3 BE deployment. -- Use common user configuration tests to reduce user selection and evaluation costs, but will not consume so many hardware resources during the entire test process. +- The test environment is configured as the user's common configuration, with 4 cloud servers, 16-core 64G SSD, and 1 FE, 3 BEs deployment. +- We select the user's common configuration test to reduce the cost of user selection and evaluation, but the entire test process will not consume so many hardware resources. -## 5. Standard SSB test results +## 5. Standard SSB Test Results -Here we use Apache Doris 1.2.0-rc01, Apache Doris 1.1.3 and Apache Doris 0.15.0 RC04 versions for comparative testing, with the following results. +Here we use Apache Doris 1.2.0-rc01, Apache Doris 1.1.3 and Apache Doris 0.15.0 RC04 for comparative testing. In the test, we use Query Time(ms) as the main performance indicator. The test results are as follows: | Query | Apache Doris 1.2.0-rc01 (ms) | Apache Doris 1.1.3 (ms) | Doris 0.15.0 RC04 (ms) | | ----- | ------- | ---------------------- | ------------------------------- | | Q1.1 | 40 | 18 | 350 | | Q1.2 | 30 | 100 | 80 | | Q1.3 | 20 | 70 | 80 | -| Q2.1 | 350 | 940 | 20680 | -| Q2.2 | 320 | 750 | 18250 | -| Q2.3 | 300 | 720 | 14760 | -| Q3.1 | 650 | 2150 | 22190 | -| Q3.2 | 260 | 510 | 8360 | -| Q3.3 | 220 | 450 | 6200 | +| Q2.1 | 350 | 940 | 20,680 | +| Q2.2 | 320 | 750 | 18,250 | +| Q2.3 | 300 | 720 | 14,760 | +| Q3.1 | 650 | 2,150 | 22,190 | +| Q3.2 | 260 | 510 | 8,360 | +| Q3.3 | 220 | 450 | 6,200 | | Q3.4 | 60 | 70 | 160 | -| Q4.1 | 840 | 1480 | 24320 | -| Q4.2 | 460 | 560 | 6310 | -| Q4.3 | 610 | 660 | 10170 | -| Total | 4160 | 8478 | 131910 | +| Q4.1 | 840 | 1,480 | 24,320 | +| Q4.2 | 460 | 560 | 6,310 | +| Q4.3 | 610 | 660 | 10,170 | +| Total | 4,160 | 8,478 | 131,910 | + + -**Interpretation of results** +**Interpretation of Results** - The data set corresponding to the test results is scale 100, about 600 million. -- The test environment is configured to be commonly used by users, including 4 cloud servers, 16-core 64G SSD, and 1 FE and 3 BE deployment. -- Use common user configuration tests to reduce user selection and evaluation costs, but will not consume so many hardware resources during the entire test process. +- The test environment is configured as the user's common configuration, with 4 cloud servers, 16-core 64G SSD, and 1 FE 3 BEs deployment. +- We select the user's common configuration test to reduce the cost of user selection and evaluation, but the entire test process will not consume so many hardware resources. ## 6. Environment Preparation -Please first refer to the [official documentation](. /install/install-deploy.md) for Apache Doris installation and deployment to get a working Doris cluster (at least 1 FE 1 BE, 1 FE 3 BE recommended). +Please first refer to the [official documentation](. /install/install-deploy.md) to install and deploy Apache Doris first to obtain a Doris cluster which is working well(including at least 1 FE 1 BE, 1 FE 3 BEs is recommended). -The scripts covered in the following documentation are stored in the Apache Doris codebase: [ssb-tools](https://github.com/apache/doris/tree/master/tools/ssb-tools) +The scripts mentioned in the following documents are stored in the Apache Doris codebase: [ssb-tools](https://github.com/apache/doris/tree/master/tools/ssb-tools) ## 7. Data Preparation -### 7.1 Download and install the SSB data generation tool. +### 7.1 Download and Install the SSB Data Generation Tool. Execute the following script to download and compile the [ssb-dbgen](https://github.com/electrum/ssb-dbgen.git) tool. @@ -143,9 +143,9 @@ Execute the following script to download and compile the [ssb-dbgen](https://git sh build-ssb-dbgen.sh ```` -After successful installation, the `dbgen` binary will be generated in the `ssb-dbgen/` directory. +After successful installation, the `dbgen` binary will be generated under the `ssb-dbgen/` directory. -### 7.2 Generate SSB test set +### 7.2 Generate SSB Test Set Execute the following script to generate the SSB dataset: @@ -153,31 +153,31 @@ Execute the following script to generate the SSB dataset: sh gen-ssb-data.sh -s 100 -c 100 ```` -> Note 1: See script help with `sh gen-ssb-data.sh -h`. +> Note 1: Check the script help via `sh gen-ssb-data.sh -h`. > -> Note 2: The data will be generated in the `ssb-data/` directory with the suffix `.tbl`. The total file size is about 60GB. The generation time may vary from a few minutes to an hour. +> Note 2: The data will be generated under the `ssb-data/` directory with the suffix `.tbl`. The total file size is about 60GB and may need a few minutes to an hour to generate. > -> Note 3: `-s 100` indicates that the test set size factor is 100, `-c 100` indicates that 100 concurrent threads generate data for the lineorder table. The `-c` parameter also determines the number of files in the final lineorder table. The larger the parameter, the larger the number of files and the smaller each file. +> Note 3: `-s 100` indicates that the test set size factor is 100, `-c 100` indicates that 100 concurrent threads generate the data of the lineorder table. The `-c` parameter also determines the number of files in the final lineorder table. The larger the parameter, the larger the number of files and the smaller each file. With the `-s 100` parameter, the resulting dataset size is: | Table | Rows | Size | File Number | | --------- | ---------------- | ---- | ----------- | -| lineorder | 6亿(600037902) | 60GB | 100 | -| customer | 300万(3000000) | 277M | 1 | -| part | 140万(1400000) | 116M | 1 | -| supplier | 20万(200000) | 17M | 1 | -| date | 2556 | 228K | 1 | +| lineorder | 600,037,902 | 60GB | 100 | +| customer | 3,000,000 | 277M | 1 | +| part | 1,400,000 | 116M | 1 | +| supplier | 200,000 | 17M | 1 | +| date | 2,556 | 228K | 1 | -### 7.3 Create table +### 7.3 Create Table -#### 7.3.1 Prepare the `doris-cluster.conf` file. +#### 7.3.1 Prepare the `doris-cluster.conf` File. -Before calling the import script, you need to write the FE's ip port and other information in the `doris-cluster.conf` file. +Before import the script, you need to write the FE’s ip port and other information in the `doris-cluster.conf` file. -File location and `load-ssb-dimension-data.sh` level. +The file location is at the same level as `load-ssb-dimension-data.sh`. -The contents of the file include FE's ip, HTTP port, user name, password and the DB name of the data to be imported: +The content of the file includes FE's ip, HTTP port, user name, password and the DB name of the data to be imported: ```shell export FE_HOST="xxx" @@ -186,17 +186,17 @@ export FE_QUERY_PORT="9030" export USER="root" export PASSWORD='xxx' export DB="ssb" -```` +``` -#### 7.3.2 Execute the following script to generate and create the SSB table: +#### 7.3.2 Execute the Following Script to Generate and Create the SSB Table: ```shell sh create-ssb-tables.sh ```` -Or copy [create-ssb-tables.sql](https://github.com/apache/incubator-doris/tree/master/tools/ssb-tools/ddl/create-ssb-tables.sql) and [ create-ssb-flat-table.sql](https://github.com/apache/incubator-doris/tree/master/tools/ssb-tools/ddl/create-ssb-flat-table.sql) of the The create table statements are executed in the MySQL client. +Or copy the table creation statements in [create-ssb-tables.sql](https://github.com/apache/incubator-doris/tree/master/tools/ssb-tools/ddl/create-ssb-tables.sql) and [ create-ssb-flat-table.sql](https://github.com/apache/incubator-doris/tree/master/tools/ssb-tools/ddl/create-ssb-flat-table.sql) and then execute them in the MySQL client. -The following is the `lineorder_flat` table build statement. The `lineorder_flat` table is created in the `create-ssb-flat-table.sh` script above with the default number of buckets (48 buckets). You can delete this table and tune this bucketing number according to your cluster size node configuration to get better one test results. +The following is the `lineorder_flat` table build statement. Create the `lineorder_flat` table in the above `create-ssb-flat-table.sh` script, and perform the default number of buckets (48 buckets). You can delete this table and adjust the number of buckets according to your cluster scale node configuration, so as to obtain a better test result. ```sql CREATE TABLE `lineorder_flat` ( @@ -260,21 +260,22 @@ PROPERTIES ( ### 7.4 Import data -We use the following command to complete the import of all data from SSB test set and SSB FLAT wide table data synthesis and import into the table. +We use the following command to complete all data import of SSB test set and SSB FLAT wide table data synthesis and then import into the table. ```shell sh bin/load-ssb-data.sh -c 10 ``` -`-c 5` means start 10 concurrent threads for import (default is 5). In the single BE node case, the lineorder data generated by `sh gen-ssb-data.sh -s 100 -c 100` will also generate the data of the ssb-flat table at the end, if more threads are started, it can speed up the import, but it will add extra memory overhead. +`-c 5` means start 10 concurrent threads to import (5 by default). In the case of a single BE node, the lineorder data generated by `sh gen-ssb-data.sh -s 100 -c 100` will also generate the data of the ssb-flat table in the end. If more threads are enabled, the import speed can be accelerated. But it will cost extra memory. > Notes. > -> 1. This configuration indicates the number of write threads per data directory, and the default is 2. Larger data can improve write data throughput, but may increase IO Util. (Reference value: 1 mechanical disk, at default is 2, the IO Util during import is about 12%, and when set to 5, the IO Util is about 26%. (In case of SSD disks, it is almost 0). -> -> 2. flat table data using 'INSERT INTO ... SELECT ... ' method to import. +> 1. To get faster import speed, you can add `flush_thread_num_per_store=5` in be.conf and then restart BE. This configuration indicates the number of disk writing threads for each data directory, 2 by default. Larger data can improve write data throughput, but may increase IO Util. (Reference value: 1 mechanical disk, with 2 by default, the IO Util during the import process is about 12%. When it is set to 5, the IO Util is about 26%. If it is an SSD disk, it is almost 0%) . +> +> 2. The flat table data is imported by 'INSERT INTO ... SELECT ... '. + +### 7.5 Checking Imported data -### 7.5 Check imported data ```sql select count(*) from part; @@ -285,20 +286,23 @@ select count(*) from lineorder; select count(*) from lineorder_flat; ``` -The amount of data should be the same as the number of rows that generate the data. +The amount of data should be consistent with the number of rows of generated data. | Table | Rows | Origin Size | Compacted Size(1 Replica) | | -------------- | ---------------- | ----------- | ------------------------- | -| lineorder_flat | 6亿(600037902) | | 59.709 GB | -| lineorder | 6亿(600037902) | 60 GB | 14.514 GB | -| customer | 300万(3000000) | 277 MB | 138.247 MB | -| part | 140万(1400000) | 116 MB | 12.759 MB | -| supplier | 20万(200000) | 17 MB | 9.143 MB | -| date | 2556 | 228 KB | 34.276 KB | +| lineorder_flat | 600,037,902 | | 59.709 GB | +| lineorder | 600,037,902 | 60 GB | 14.514 GB | +| customer | 3,000,000 | 277 MB | 138.247 MB | +| part | 1,400,000 | 116 MB | 12.759 MB | +| supplier | 200,000 | 17 MB | 9.143 MB | +| date | 2,556 | 228 KB | 34.276 KB | + +### 7.6 Query Test -### 7.6 Query test +- SSB-Flat Query Statement: [ ssb-flat-queries](https://github.com/apache/doris/tree/master/tools/ssb-tools/ssb-flat-queries) +- Standard SSB Queries: [ ssb-queries](https://github.com/apache/doris/tree/master/tools/ssb-tools/ssb-queries) -#### 7.6.1 SSB FLAT Test SQL +#### 7.6.1 SSB FLAT Test for SQL ```sql --Q1.1 @@ -385,7 +389,7 @@ GROUP BY YEAR, S_CITY, P_BRAND ORDER BY YEAR ASC, S_CITY ASC, P_BRAND ASC; ``` -#### 7.6.2 SSB Standard Test SQL +#### 7.6.2 SSB Standard Test for SQL ```SQL --Q1.1 @@ -601,4 +605,3 @@ WHERE GROUP BY d_year, s_city, p_brand ORDER BY d_year, s_city, p_brand; ``` - diff --git a/docs/benchmark/tpch.md b/docs/benchmark/tpch.md index 06eea56247f..c7d679f94ac 100644 --- a/docs/benchmark/tpch.md +++ b/docs/benchmark/tpch.md @@ -24,52 +24,50 @@ specific language governing permissions and limitations under the License. --> -# TPC-H benchmark +# TPC-H Benchmark -TPC-H is a Decision Support Benchmark consisting of a set of business-oriented ad hoc queries and concurrent data modifications. The data that queries and populates the database has broad industry relevance. This benchmark demonstrates a decision support system that examines large amounts of data, executes highly complex queries, and answers critical business questions. The performance metric reported by TPC-H is called the TPC-H Hourly Compound Query Performance Metric (QphH@Size) and r [...] +TPC-H is a decision support benchmark (Decision Support Benchmark), which consists of a set of business-oriented special query and concurrent data modification. The data that is queried and populates the database has broad industry relevance. This benchmark demonstrates a decision support system that examines large amounts of data, executes highly complex queries, and answers key business questions. The performance index reported by TPC-H is called TPC-H composite query performance index [...] -This document mainly introduces the performance of Doris on the TPC-H test set. +This document mainly introduces the performance of Doris on the TPC-H 100G test set. -> Note 1: Standard test sets including TPC-H are usually far from actual business scenarios, and some tests will perform parameter tuning for the test set. Therefore, the test results of the standard test set can only reflect the performance of the database in specific scenarios. Users are advised to conduct further testing with actual business data. +> Note 1: The standard test set including TPC-H is usually far from the actual business scenario, and some tests will perform parameter tuning for the test set. Therefore, the test results of the standard test set can only reflect the performance of the database in a specific scenario. We suggest users use actual business data for further testing. > -> Note 2: The operations covered in this document are tested on CentOS 7.x. +> Note 2: The operations involved in this document are all tested on CentOS 7.x. -On 22 queries on the TPC-H standard test dataset, we conducted pairwise tests based on Apache Doris 1.2.0-rc01, Apache Doris 1.1.3 and Apache Doris 0.15.0 RC04 versions. The overall performance of Apache Doris 1.2.0-rc01 is nearly 3 times better than that of Apache Doris 1.1.3 and nearly 11 times better than that of Apache Doris 0.15.0 RC04. - - +On 22 queries on the TPC-H standard test data set, we conducted a comparison test based on Apache Doris 1.2.0-rc01, Apache Doris 1.1.3 and Apache Doris 0.15.0 RC04 versions. Compared with Apache Doris 1.1.3, the overall performance of Apache Doris 1.2.0-rc01 has been improved by nearly 3 times, and by nearly 11 times compared with Apache Doris 0.15.0 RC04. ## 1. Hardware Environment | Hardware | Configuration Instructions | | -------- | ------------------------------------ | -| number of machines | 4 Alibaba Cloud hosts (1 FE, 3 BE) | +| Number of mMachines | 4 Tencent Cloud Virtual Machine(1FE,3BEs) | | CPU | Intel Xeon(Cascade Lake) Platinum 8269CY 16C (2.5 GHz/3.2 GHz) | | Memory | 64G | | Network | 5Gbps | -| Disk | ESSD cloud hard disk | +| Disk | ESSD Cloud Hard Disk | ## 2. Software Environment -- Doris deploys 3BE 1FE; -- Kernel version: Linux version 5.4.0-96-generic (buildd@lgw01-amd64-051) +- Doris Deployed 3BEs and 1FE +- Kernel Version: Linux version 5.4.0-96-generic (buildd@lgw01-amd64-051) - OS version: CentOS 7.8 - Doris software version: Apache Doris 1.2.0-rc01、 Apache Doris 1.1.3 、 Apache Doris 0.15.0 RC04 - JDK: openjdk version "11.0.14" 2022-01-18 ## 3. Test Data Volume -The entire test simulation generates 100G of data and is imported into Apache Doris 1.2.0-rc01, Apache Doris 1.1.3 and Apache Doris 0.15.0 RC04 versions for testing. The following is the relevant description of the table and the amount of data. +The TPCH 100G data generated by the simulation of the entire test are respectively imported into Apache Doris 1.2.0-rc01, Apache Doris 1.1.3 and Apache Doris 0.15.0 RC04 for testing. The following is the relevant description and data volume of the table. -| TPC-H Table Name | Rows | data size | remark | +| TPC-H Table Name | Rows | Size after Import | Annotation | | :--------------- | :---------- | ---------- | :----- | -| REGION | 5 | 400KB | | -| NATION | 25 | 7.714 KB | | -| SUPPLIER | 100 million | 85.528 MB | | -| PART | 20 million | 752.330 MB | | -| PARTSUPP | 80 million | 4.375 GB | | -| CUSTOMER | 15 million | 1.317 GB | | -| ORDERS | 1.5 billion | 6.301 GB | | -| LINEITEM | 6 billion | 20.882 GB | | +| REGION | 5 | 400KB | Region | +| NATION | 25 | 7.714 KB | Nation | +| SUPPLIER | 1,000,000 | 85.528 MB | Supplier | +| PART | 20,000,000 | 752.330 MB | Parts | +| PARTSUPP | 20,000,000 | 4.375 GB | Parts Supply | +| CUSTOMER | 15,000,000 | 1.317 GB | Customer | +| ORDERS | 1,500,000,000 | 6.301 GB | Orders | +| LINEITEM | 6,000,000,000 | 20.882 GB | Order Details | ## 4. Test SQL @@ -77,7 +75,7 @@ TPCH 22 test query statements : [TPCH-Query-SQL](https://github.com/apache/inc **Notice:** -The following four parameters in the above SQL are not present in Apache Doris 0.15.0 RC04 and are removed during execution. +The following four parameters in the above SQL do not exist in Apache Doris 0.15.0 RC04. When executing, please remove: ``` 1. enable_vectorized_engine=true, @@ -86,11 +84,11 @@ The following four parameters in the above SQL are not present in Apache Doris 0 4. enable_projection=true ``` -## 5. Test Result +## 5. Test Results -Here we use Apache Doris 1.2.0-rc01, Apache Doris 1.1.3 and Apache Doris 0.15.0 RC04 versions for comparison tests with the following results. +Here we use Apache Doris 1.2.0-rc01, Apache Doris 1.1.3 and Apache Doris 0.15.0 RC04 for comparative testing. In the test, we use Query Time(ms) as the main performance indicator. The test results are as follows: -| Query | Apache Doris 1.2.0-rc01 (s) | Apache Doris 1.1.3 (s) | Apache Doris 0.15.0 RC04 (s) | +| Query | Apache Doris 1.2.0-rc01 (ms) | Apache Doris 1.1.3 (ms) | Apache Doris 0.15.0 RC04 (ms) | | -------- | --------------------------- | ---------------------- | ---------------------------- | | Q1 | 2.12 | 3.75 | 28.63 | | Q2 | 0.20 | 4.22 | 7.88 | @@ -116,11 +114,13 @@ Here we use Apache Doris 1.2.0-rc01, Apache Doris 1.1.3 and Apache Doris 0.15.0 | Q22 | 0.46 | 0.9 | 3.22 | | **Total** | **19.64** | **51.253** | **223.33** | -- **Result description** - - The data set corresponding to the test results is scale 100, about 600 million. - - The test environment is configured to be commonly used by users, including 4 cloud servers, 16-core 64G SSD, and 1 FE and 3 BE deployment. - - Use common user configuration tests to reduce user selection and evaluation costs, but will not consume so many hardware resources during the entire test process. - - Apache Doris 0.15 RC04 Q14 execution failed in TPC-H test, unable to complete query. + + +- **Result Description** + - The data set corresponding to the test results is scale 100, about 600 million. + - The test environment is configured as the user's common configuration, with 4 cloud servers, 16-core 64G SSD, and 1 FE 3 BEs deployment. + - Select the user's common configuration test to reduce the cost of user selection and evaluation, but the entire test process will not consume so many hardware resources. + - Apache Doris 0.15 RC04 failed to execute Q14 in the TPC-H test, unable to complete the query. ## 6. Environmental Preparation @@ -128,7 +128,7 @@ Please refer to the [official document](../install/install-deploy.md) to install ## 7. Data Preparation -### 7.1 Download and install the TPC-H data generation tool +### 7.1 Download and Install TPC-H Data Generation Tool Execute the following script to download and compile the [tpch-tools](https://github.com/apache/incubator-doris/tree/master/tools/tpch-tools) tool. @@ -136,9 +136,9 @@ Execute the following script to download and compile the [tpch-tools](https://gi sh build-tpch-dbgen.sh ``` -After successful installation, the `dbgen` binary will be generated in the `TPC-H_Tools_v3.0.0/` directory. +After successful installation, the `dbgen` binary will be generated under the `TPC-H_Tools_v3.0.0/` directory. -### 7.2 Generate TPC-H test set +### 7.2 Generating the TPC-H Test Set Execute the following script to generate the TPC-H dataset: @@ -146,21 +146,21 @@ Execute the following script to generate the TPC-H dataset: sh gen-tpch-data.sh ``` -> Note 1: View script help via `sh gen-tpch-data.sh -h`. +> Note 1: Check the script help via `sh gen-tpch-data.sh -h`. > -> Note 2: The data will be generated in the `tpch-data/` directory with the suffix `.tbl`. The total file size is about 100GB. The generation time may vary from a few minutes to an hour. +> Note 2: The data will be generated under the `tpch-data/` directory with the suffix `.tbl`. The total file size is about 100GB and may need a few minutes to an hour to generate. > -> Note 3: The standard test data set of 100G is generated by default +> Note 3: A standard test data set of 100G is generated by default. ### 7.3 Create Table -#### 7.3.1 Prepare the `doris-cluster.conf` file +#### 7.3.1 Prepare the `doris-cluster.conf` File -Before calling the import script, you need to write the FE's ip port and other information in the `doris-cluster.conf` file. +Before import the script, you need to write the FE’s ip port and other information in the `doris-cluster.conf` file. -File location and `load-tpch-data.sh` level. +The file location is at the same level as `load-tpch-data.sh`. -The contents of the file include FE's ip, HTTP port, user name, password and the DB name of the data to be imported: +The content of the file includes FE's ip, HTTP port, user name, password and the DB name of the data to be imported: ```shell # Any of FE host @@ -177,17 +177,17 @@ export PASSWORD='' export DB='tpch1' ``` -#### 7.3.2 Execute the following script to generate and create the TPC-H table +#### Execute the Following Script to Generate and Create TPC-H Table ```shell sh create-tpch-tables.sh ``` -Or copy the table creation statement in [create-tpch-tables.sql](https://github.com/apache/incubator-doris/blob/master/tools/tpch-tools/create-tpch-tables.sql), Execute in Doris. +Or copy the table creation statement in [create-tpch-tables.sql](https://github.com/apache/incubator-doris/blob/master/tools/tpch-tools/create-tpch-tables.sql) and excute it in Doris. -### 7.4 导入数据 +### 7.4 Import Data -通过下面的命令执行数据导入: +Please perform data import with the following command: ```shell sh ./load-tpch-data.sh @@ -221,14 +221,14 @@ Execute the above test SQL or execute the following command >Notice: > ->1. At present, the query optimizer and statistics functions of Doris are not perfect, so we rewrite some queries in TPC-H to adapt to the execution framework of Doris, but it does not affect the correctness of the results +>1. At present, the query optimizer and statistics functions of Doris are not so perfect, so we rewrite some queries in TPC-H to adapt to the execution framework of Doris, but it does not affect the correctness of the results > ->2. Doris' new query optimizer will be released in subsequent versions +>2. Doris' new query optimizer will be released in future versions >3. Set `set mem_exec_limit=8G` before executing the query #### 7.6.2 Single SQL Execution -The following is the SQL statement used in the test, you can also get the latest SQL from the code base. Latest test query statement address: [TPC-H test statement](https://github.com/apache/doris/tree master/tools/tpch-tools/query) +The following is the SQL statement used in the test, you can also get the latest SQL from the code base. ```SQL --Q1 @@ -876,6 +876,3 @@ group by order by cntrycode; ``` - - - --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org