This is an automated email from the ASF dual-hosted git repository. diwu pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push: new 6c1f6859f47 update index docs structure (#1437) 6c1f6859f47 is described below commit 6c1f6859f4774ff77939fb038f62dd93a4122a22 Author: Kang <xiaok...@selectdb.com> AuthorDate: Tue Dec 31 17:43:29 2024 +0800 update index docs structure (#1437) ## Versions - [ ] dev - [ ] 3.0 - [ ] 2.1 - [ ] 2.0 ## Languages - [ ] Chinese - [ ] English ## Docs Checklist - [ ] Checked by AI - [ ] Test Cases Built --- docs/table-design/index/bloomfilter.md | 14 ++- docs/table-design/index/inverted-index.md | 47 ++++---- docs/table-design/index/ngram-bloomfilter-index.md | 21 +++- docs/table-design/index/prefix-index.md | 19 +++- .../current/table-design/index/bloomfilter.md | 12 ++- .../current/table-design/index/inverted-index.md | 69 +++++++----- .../table-design/index/ngram-bloomfilter-index.md | 19 +++- .../current/table-design/index/prefix-index.md | 26 ++++- .../version-2.0/table-design/index/bloomfilter.md | 16 ++- .../table-design/index/inverted-index.md | 51 +++++---- .../table-design/index/ngram-bloomfilter-index.md | 19 +++- .../version-2.0/table-design/index/prefix-index.md | 27 ++++- .../version-2.1/table-design/index/bloomfilter.md | 17 +-- .../table-design/index/inverted-index.md | 71 ++++++++----- .../table-design/index/ngram-bloomfilter-index.md | 19 +++- .../version-2.1/table-design/index/prefix-index.md | 27 ++++- .../version-3.0/table-design/index/bloomfilter.md | 12 ++- .../table-design/index/inverted-index.md | 46 +++++--- .../table-design/index/ngram-bloomfilter-index.md | 19 +++- .../version-3.0/table-design/index/prefix-index.md | 26 ++++- .../version-2.0/table-design/index/bloomfilter.md | 14 ++- .../table-design/index/inverted-index.md | 53 +++++---- .../table-design/index/ngram-bloomfilter-index.md | 21 +++- .../version-2.0/table-design/index/prefix-index.md | 19 +++- .../version-2.1/table-design/index/bloomfilter.md | 14 ++- .../table-design/index/inverted-index.md | 118 +++++++++++---------- .../table-design/index/ngram-bloomfilter-index.md | 21 +++- .../version-2.1/table-design/index/prefix-index.md | 19 +++- .../version-3.0/table-design/index/bloomfilter.md | 14 ++- .../table-design/index/inverted-index.md | 49 +++++---- .../table-design/index/ngram-bloomfilter-index.md | 21 +++- .../version-3.0/table-design/index/prefix-index.md | 19 +++- 32 files changed, 664 insertions(+), 295 deletions(-) diff --git a/docs/table-design/index/bloomfilter.md b/docs/table-design/index/bloomfilter.md index bbf569a6cd2..49094e3b853 100644 --- a/docs/table-design/index/bloomfilter.md +++ b/docs/table-design/index/bloomfilter.md @@ -27,7 +27,7 @@ under the License. ## Indexing Principles -The BloomFilter index is a type of skip list index based on BloomFilter. Its principle is to use BloomFilter to skip data blocks that do not meet the specified conditions for equality queries, thereby reducing IO and accelerating queries. +The BloomFilter index is a type of skip index based on BloomFilter. Its principle is to use BloomFilter to skip data blocks that do not meet the specified conditions for equality queries, thereby reducing IO and accelerating queries. BloomFilter is a fast lookup algorithm proposed by Bloom in 1970, which uses multiple hash functions. It is commonly used in scenarios where quick determination of whether an element belongs to a set is needed without requiring 100% accuracy. BloomFilter has the following characteristics: @@ -63,7 +63,7 @@ To check the effect of a BloomFilter index on a query, you can analyze relevant ::: -## Syntax +## Managing Indexes ### Creating a BloomFilter Index When Creating a Table @@ -97,12 +97,18 @@ ALTER TABLE table_name SET ("bloom_filter_columns" = "column_name1,column_name2, ALTER TABLE table_name SET ("bloom_filter_columns" = "column_name2,column_name3"); ``` +## Using Indexes + +BloomFilter index is used to accelerate equality queries in the WHERE clause. It automatically takes effect when applicable, and there is no special syntax required. + +The acceleration effect of the BloomFilter index can be analyzed using the following metrics in the Query Profile: +- RowsBloomFilterFiltered: The number of rows filtered by the BloomFilter index, which can be compared with other Rows values to analyze the filtering effect of the index. +- BlockConditionsFilteredBloomFilterTime: The time consumed by the BloomFilter inverted index. + ## Usage Example Here is an example of how to create a BloomFilter index in Doris. -### Creating a BloomFilter Index - The BloomFilter index in Doris is created by adding the "bloom_filter_columns" property in the CREATE TABLE statement, with k1, k2, k3 being the key columns for the BloomFilter index. For example, the following creates BloomFilter indexes on saler_id and category_id. ```sql diff --git a/docs/table-design/index/inverted-index.md b/docs/table-design/index/inverted-index.md index c9a88571188..4b5adfa771c 100644 --- a/docs/table-design/index/inverted-index.md +++ b/docs/table-design/index/inverted-index.md @@ -83,18 +83,11 @@ There are some limitations to using inverted indexes: 3. DUPLICATE and UNIQUE table models with Merge-on-Write enabled support building inverted indexes on any column. However, AGGREGATE and UNIQUE models without Merge-on-Write enabled only support building inverted indexes on Key columns, as non-Key columns cannot have inverted indexes. This is because these two models require reading all data for merging, so indexes cannot be used for pre-filtering. -To see the effect of inverted indexes on a query, you can analyze relevant metrics in the Query Profile. - -- InvertedIndexFilterTime: time consumed by the inverted index - - InvertedIndexSearcherOpenTime: time to open the inverted index - - InvertedIndexSearcherSearchTime: time for internal queries of the inverted index - -- RowsInvertedIndexFiltered: number of rows filtered by the inverted index, can be compared with other Rows values to analyze the filtering effect of the BloomFilter index ::: -## Syntax +## Managing Indexes -### Define Inverted Indexes When Creating a Table +### Defining Inverted Indexes When Creating a Table In the table creation statement, after the COLUMN definition, is the index definition: @@ -265,6 +258,16 @@ ALTER TABLE table_name DROP INDEX idx_name; ::: +### Viewing Inverted Index + +-- Syntax 1: The INDEX section in the table schema with USING INVERTED indicates an inverted index +SHOW CREATE TABLE table_name; + +-- Syntax 2: IndexType as INVERTED indicates an inverted index +SHOW INDEX FROM idx_name; + +## Using Indexes + ### Accelerating Queries with Inverted Indexes ```sql @@ -320,27 +323,31 @@ SELECT * FROM table_name WHERE ts > '2023-01-01 00:00:00'; SELECT * FROM table_name WHERE op_type IN ('add', 'delete'); ``` -### TOKENIZE Function +### Analyzing Index Acceleration Effects Through Profiles + +Inverted query acceleration can be toggled using the session variable `enable_inverted_index_query`, which is set to true by default. To verify the acceleration effect of the index, it can be set to false to turn it off. + +The acceleration effect of the inverted index can be analyzed using the following metrics in the Query Profile: +- RowsInvertedIndexFiltered: The number of rows filtered by the inverted index, which can be compared with other Rows values to analyze the filtering effect of the index. +- InvertedIndexFilterTime: The time consumed by the inverted index. + - InvertedIndexSearcherOpenTime: The time taken to open the inverted index. + - InvertedIndexSearcherSearchTime: The time taken for internal queries within the inverted index. + + +### Verifying Tokenization Effects Using Tokenization Functions To check the actual effect of tokenization or to tokenize a piece of text, you can use the `TOKENIZE` function for verification. The first parameter of the `TOKENIZE` function is the text to be tokenized, and the second parameter specifies the tokenization parameters used when creating the index. -mysql> SELECT TOKENIZE('I love CHINA','"parser"="english"'); +mysql> SELECT TOKENIZE('I love Doris','"parser"="english"'); +------------------------------------------------+ -| tokenize('I love CHINA', '"parser"="english"') | +| tokenize('I love Doris', '"parser"="english"') | +------------------------------------------------+ -| ["i", "love", "china"] | +| ["i", "love", "doris"] | +------------------------------------------------+ 1 row in set (0.02 sec) -mysql> SELECT TOKENIZE('I love CHINA 我爱我的祖国','"parser"="unicode"'); -+-------------------------------------------------------------------+ -| tokenize('I love CHINA 我爱我的祖国', '"parser"="unicode"') | -+-------------------------------------------------------------------+ -| ["i", "love", "china", "我", "爱", "我", "的", "祖", "国"] | -+-------------------------------------------------------------------+ -1 row in set (0.02 sec) ``` ## Usage Example diff --git a/docs/table-design/index/ngram-bloomfilter-index.md b/docs/table-design/index/ngram-bloomfilter-index.md index 71368ead8c0..be1a3a098c1 100644 --- a/docs/table-design/index/ngram-bloomfilter-index.md +++ b/docs/table-design/index/ngram-bloomfilter-index.md @@ -27,7 +27,7 @@ under the License. ## Indexing Principles -The NGram BloomFilter index, similar to the BloomFilter index, is a skip list index based on BloomFilter. +The NGram BloomFilter index, similar to the BloomFilter index, is a skip index based on BloomFilter. Unlike the BloomFilter index, the NGram BloomFilter index is used to accelerate text LIKE queries. Instead of storing the original text values, it tokenizes the text using NGram and stores each token in the BloomFilter. For LIKE queries, the pattern in LIKE '%pattern%' is also tokenized using NGram. Each token is checked against the BloomFilter, and if any token is not found, the corresponding data block does not meet the LIKE condition and can be skipped, reducing IO and accelerating th [...] @@ -43,7 +43,7 @@ The NGram BloomFilter index can only accelerate string LIKE queries, and the num ::: -## Syntax +## Managing Indexes ### Creating an NGram BloomFilter Index @@ -67,8 +67,14 @@ Explanation of the syntax: ### Viewing NGram BloomFilter Index +-- Syntax 1: The INDEX section in the table schema with USING NGRAM_BF indicates an inverted index ```sql -SHOW CREATE TABLE table_ngrambf; +SHOW CREATE TABLE table_name; +``` + +-- Syntax 2: IndexType as NGRAM_BF indicates an inverted index +```sql +SHOW INDEX FROM idx_name; ``` ### Deleting an NGram BloomFilter Index @@ -85,6 +91,15 @@ CREATE INDEX idx_column_name2(column_name2) ON table_ngrambf USING NGRAM_BF PROP ALTER TABLE table_ngrambf ADD INDEX idx_column_name2(column_name2) USING NGRAM_BF PROPERTIES("gram_size"="3", "bf_size"="1024") COMMENT 'username ngram_bf index'; ``` +## Using Indexes + +NGram BloomFilter index is used to accelerate LIKE queries, for example: +SELECT count() FROM table1 WHERE message LIKE '%error%'; + +The acceleration effect of the BloomFilter index (including NGram) can be analyzed using the following metrics in the Query Profile: +- RowsBloomFilterFiltered: The number of rows filtered by the BloomFilter index, which can be compared with other Rows values to analyze the filtering effect of the index. +- BlockConditionsFilteredBloomFilterTime: The time consumed by the BloomFilter inverted index. + ## Usage Example This section demonstrates the usage and effectiveness of the NGram BloomFilter index using a dataset of Amazon product reviews, `amazon_reviews`. diff --git a/docs/table-design/index/prefix-index.md b/docs/table-design/index/prefix-index.md index 7801c9fce61..74d2e26ffb4 100644 --- a/docs/table-design/index/prefix-index.md +++ b/docs/table-design/index/prefix-index.md @@ -42,17 +42,30 @@ The first 36 bytes of a row in a data block are used as the prefix index for tha Prefix indexes can speed up equality queries and range queries. +## Managing Indexes + +There is no specific syntax to define a prefix index. When creating a table, the first 36 bytes of the table's Key are automatically taken as the prefix index. + +### Recommendations for Prefix Index Selection + :::tip -Since the KEY definition of a table is unique, a table can only have one type of prefix index. For queries using other columns that cannot hit the prefix index as conditions, the efficiency might not meet the requirements. There are two solutions: +Because the Key definition of a table is unique, a table has only one set of prefix indexes. Therefore, it is important to choose an appropriate prefix index when designing the table structure. The following recommendations can be considered: +1. Choose the fields most commonly used in WHERE filtering conditions as the Key. +2. Place the more frequently used fields at the front, as prefix indexes are only effective for fields in the WHERE condition that are part of the Key's prefix. + +For queries that use other columns not covered by the prefix index as conditions, the efficiency may not meet the requirements. There are two solutions: 1. Create an inverted index on the columns that require accelerated queries, as a table can have many inverted indexes. 2. For DUPLICATE tables, multi-prefix indexes can be indirectly achieved by creating corresponding strongly consistent materialized views with adjusted column orders. For more details, refer to query acceleration/materialized views. ::: -## Syntax +## Using Indexes + +Prefix indexes are used to accelerate equality and range queries in the WHERE clause. They automatically take effect when applicable, and there is no special syntax required. -There is no specific syntax for defining a prefix index. When creating a table, the first 36 bytes of the table's KEY are automatically taken as the prefix index. +The acceleration effect of the prefix index can be analyzed using the following metrics in the Query Profile: +- RowsKeyRangeFiltered: The number of rows filtered by the prefix index, which can be compared with other Rows values to analyze the filtering effect of the index. ## Example Usage diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/index/bloomfilter.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/index/bloomfilter.md index 5a77d68f27b..242ec1e95ac 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/index/bloomfilter.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/index/bloomfilter.md @@ -70,7 +70,7 @@ BloomFilter 的使用有下面一些限制: ::: -## 使用语法 +## 管理索引 ### 建表时创建 BloomFilter 索引 @@ -102,12 +102,18 @@ ALTER TABLE table_name SET ("bloom_filter_columns" = "column_name1,column_name2, ALTER TABLE table_name SET ("bloom_filter_columns" = "column_name2,column_name3"); ``` +## 使用索引 + +BloomFilter 索引用于加速 WHERE 条件中的等值查询,能加速时自动生效,没有特殊语法。 + +可以通过 Query Profile 中的下面几个指标分析 BloomFilter 索引的加速效果。 +- RowsBloomFilterFiltered BloomFilter 索引过滤掉的行数,可以与其他几个 Rows 值对比分析索引过滤效果 +- BlockConditionsFilteredBloomFilterTime BloomFilter 倒排索引消耗的时间 + ## 使用示例 下面通过实例来看看 Doris 怎么创建 BloomFilter 索引。 -### 创建 BloomFilter 索引 - Doris BloomFilter 索引的创建是通过在建表语句的 PROPERTIES 里加上 "bloom_filter_columns"="k1,k2,k3", 这个属性,k1,k2,k3 是要创建的 BloomFilter 索引的 Key 列名称,例如下面对表里的 saler_id,category_id 创建了 BloomFilter 索引。 ```sql diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/index/inverted-index.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/index/inverted-index.md index 0df60acefe2..98dc525b983 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/index/inverted-index.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/index/inverted-index.md @@ -47,7 +47,7 @@ under the License. - 支持短语查询 `MATCH_PHRASE` - 支持指定词距 `slop` - - 支持短语+前缀 `MATCH_PHRASE_PREFIX` + - 支持短语 + 前缀 `MATCH_PHRASE_PREFIX` - 支持分词正则查询 `MATCH_REGEXP` @@ -84,18 +84,9 @@ under the License. 3. DUPLICATE 和 开启 Merge-on-Write 的 UNIQUE 表模型支持任意列建倒排索引。但是 AGGREGATE 和 未开启 Merge-on-Write 的 UNIQUE 模型仅支持 Key 列建倒排索引,非 Key 列不能建倒排索引,这是因为这两个模型需要读取所有数据后做合并,因此不能利用索引做提前过滤。 - -如果要查看某个查询倒排索引效果,可以通过 Query Profile 中的相关指标进行分析。 - -- InvertedIndexFilterTime 是倒排索引消耗的时间 - - InvertedIndexSearcherOpenTime 是倒排索引打开索引的时间 - - InvertedIndexSearcherSearchTime 是倒排索引内部查询的时间 - -- RowsInvertedIndexFiltered 是倒排过滤掉的行数,可以与其他几个 Rows 值对比分析 BloomFilter 索引过滤效果 ::: - -## 使用语法 +## 管理索引 ### 建表时定义倒排索引 @@ -175,7 +166,7 @@ table_properties; <details> <summary>ignore_above</summary> - **用于指定不分词字符串索引(没有指定parser)的长度限制** + **用于指定不分词字符串索引(没有指定 parser)的长度限制** <p>- 长度超过 ignore_above 设置的字符串不会被索引。对于字符串数组,ignore_above 将分别应用于每个数组元素,长度超过 ignore_above 的字符串元素将不被索引。</p> <p>- 默认为 256,单位是字节</p> @@ -243,7 +234,7 @@ CANCEL BUILD INDEX ON table_name (job_id1,jobid_2,...); :::tip -`BUILD INDEX` 会生成一个异步任务执行,在每个 BE 上有多个线程执行索引构建任务,通过 BE 参数 `alter_index_worker_count` 可以设置,默认值是3。 +`BUILD INDEX` 会生成一个异步任务执行,在每个 BE 上有多个线程执行索引构建任务,通过 BE 参数 `alter_index_worker_count` 可以设置,默认值是 3。 2.0.12 和 2.1.4 之前的版本 `BUILD INDEX` 会一直重试直到成功,从这两个版本开始通过失败和超时机制避免一直重试。3.0 存算分离模式暂不支持此命令。 @@ -265,10 +256,23 @@ ALTER TABLE table_name DROP INDEX idx_name; :::tip -`DROP INDEX` 会删除索引定义,新写入数据不会再写索引,同时会生成一个异步任务执行索引删除操作,在每个 BE 上有多个线程执行索引构建任务,通过 BE 参数 `alter_index_worker_count` 可以设置,默认值是3。 +`DROP INDEX` 会删除索引定义,新写入数据不会再写索引,同时会生成一个异步任务执行索引删除操作,在每个 BE 上有多个线程执行索引删除任务,通过 BE 参数 `alter_index_worker_count` 可以设置,默认值是 3。 ::: +### 查看倒排索引 + +```sql +-- 语法 1,表的 schema 中 INDEX 部分 USING INVERTED 是倒排索引 +SHOW CREATE TABLE table_name; + +-- 语法 2,IndexType 为 INVERTED 的是倒排索引 +SHOW INDEX FROM idx_name; +``` + + +## 使用索引 + ### 利用倒排索引加速查询 ```sql @@ -287,30 +291,30 @@ SELECT * FROM table_name WHERE content MATCH_ALL 'keyword1 keyword2'; -- 2. 全文检索短语匹配,通过 MATCH_PHRASE 完成 -- 2.1 content 列中同时包含 keyword1 和 keyword2 的行,而且 keyword2 必须紧跟在 keyword1 后面 --- 'keyword1 keyword2','wordx keyword1 keyword2','wordx keyword1 keyword2 wordy' 能匹配,因为他们都包含keyword1 keyword2,而且keyword2 紧跟在 keyword1 后面 +-- 'keyword1 keyword2','wordx keyword1 keyword2','wordx keyword1 keyword2 wordy' 能匹配,因为他们都包含 keyword1 keyword2,而且 keyword2 紧跟在 keyword1 后面 -- 'keyword1 wordx keyword2' 不能匹配,因为 keyword1 keyword2 之间隔了一个词 wordx -- 'keyword2 keyword1',因为 keyword1 keyword2 的顺序反了 SELECT * FROM table_name WHERE content MATCH_PHRASE 'keyword1 keyword2'; --- 2.2 content 列中同时包含 keyword1 和 keyword2 的行,而且 keyword1 keyword2 的 `词距`(slop) 不超过3 --- 'keyword1 keyword2', 'keyword1 a keyword2', 'keyword1 a b c keyword2' 都能匹配,因为keyword1 keyword2中间隔的词分别是0 1 3 都不超过3 --- 'keyword1 a b c d keyword2' 不能能匹配,因为keyword1 keyword2中间隔的词有4个,超过3 --- 'keyword2 keyword1', 'keyword2 a keyword1', 'keyword2 a b c keyword1' 也能匹配,因为指定 slop > 0 时不再要求keyword1 keyword2 的顺序。这个行为参考了 ES,与直觉的预期不一样,因此 Doris 提供了在 slop 后面指定正数符号(+)表示需要保持 keyword1 keyword2 的先后顺序 +-- 2.2 content 列中同时包含 keyword1 和 keyword2 的行,而且 keyword1 keyword2 的 `词距`(slop)不超过 3 +-- 'keyword1 keyword2', 'keyword1 a keyword2', 'keyword1 a b c keyword2' 都能匹配,因为 keyword1 keyword2 中间隔的词分别是 0 1 3 都不超过 3 +-- 'keyword1 a b c d keyword2' 不能能匹配,因为 keyword1 keyword2 中间隔的词有 4 个,超过 3 +-- 'keyword2 keyword1', 'keyword2 a keyword1', 'keyword2 a b c keyword1' 也能匹配,因为指定 slop > 0 时不再要求 keyword1 keyword2 的顺序。这个行为参考了 ES,与直觉的预期不一样,因此 Doris 提供了在 slop 后面指定正数符号(+)表示需要保持 keyword1 keyword2 的先后顺序 SELECT * FROM table_name WHERE content MATCH_PHRASE 'keyword1 keyword2 ~3'; -- slop 指定正号,'keyword1 a b c keyword2' 能匹配,而 'keyword2 a b c keyword1' 不能匹配 SELECT * FROM table_name WHERE content MATCH_PHRASE 'keyword1 keyword2 ~3+'; --- 2.3 在保持词顺序的前提下,对最后一个词keyword2做前缀匹配,默认找50个前缀词(session变量inverted_index_max_expansions控制) --- 'keyword1 keyword2abc' 能匹配,因为keyword1完全一样,最后一个 keyword2abc 是 keyword2 的前缀 +-- 2.3 在保持词顺序的前提下,对最后一个词 keyword2 做前缀匹配,默认找 50 个前缀词(session 变量 inverted_index_max_expansions 控制) +-- 'keyword1 keyword2abc' 能匹配,因为 keyword1 完全一样,最后一个 keyword2abc 是 keyword2 的前缀 -- 'keyword1 keyword2' 也能匹配,因为 keyword2 也是 keyword2 的前缀 -- 'keyword1 keyword3' 不能匹配,因为 keyword3 不是 keyword2 的前缀 -- 'keyword1 keyword3abc' 也不能匹配,因为 keyword3abc 也不是 keyword2 的前缀 SELECT * FROM table_name WHERE content MATCH_PHRASE_PREFIX 'keyword1 keyword2'; --- 2.4 如果只填一个词会退化为前缀查询,默认找50个前缀词(session变量inverted_index_max_expansions控制) +-- 2.4 如果只填一个词会退化为前缀查询,默认找 50 个前缀词(session 变量 inverted_index_max_expansions 控制) SELECT * FROM table_name WHERE content MATCH_PHRASE_PREFIX 'keyword1'; --- 2.5 对分词后的词进行正则匹配,默认匹配50个(session变量inverted_index_max_expansions控制) +-- 2.5 对分词后的词进行正则匹配,默认匹配 50 个(session 变量 inverted_index_max_expansions 控制) -- 类似 MATCH_PHRASE_PREFIX 的匹配规则,只是前缀变成了正则 SELECT * FROM table_name WHERE content MATCH_REGEXP 'key*'; @@ -320,7 +324,18 @@ SELECT * FROM table_name WHERE ts > '2023-01-01 00:00:00'; SELECT * FROM table_name WHERE op_type IN ('add', 'delete'); ``` -### 分词函数 +### 通过profile分析索引加速效果 + +倒排查询加速可以通过 session 变量 `enable_inverted_index_query` 开关,默认是 true 打开,有时为了验证索引加速效果可以设置为 false 关闭。 + +可以通过 Query Profile 中的下面几个指标分析倒排索引的加速效果。 +- RowsInvertedIndexFiltered 倒排过滤掉的行数,可以与其他几个 Rows 值对比分析索引过滤效果 +- InvertedIndexFilterTime 倒排索引消耗的时间 + - InvertedIndexSearcherOpenTime 倒排索引打开索引的时间 + - InvertedIndexSearcherSearchTime 倒排索引内部查询的时间 + + +### 用分词函数验证分词效果 如果想检查分词实际效果或者对一段文本进行分词行为,可以使用 TOKENIZE 函数进行验证。 @@ -351,11 +366,11 @@ mysql> SELECT TOKENIZE('武汉市长江大桥','"parser"="chinese","parser_mode" +----------------------------------------------------------------------------------------+ 1 row in set (0.02 sec) -mysql> SELECT TOKENIZE('I love CHINA','"parser"="english"'); +mysql> SELECT TOKENIZE('I love Doris','"parser"="english"'); +------------------------------------------------+ -| tokenize('I love CHINA', '"parser"="english"') | +| tokenize('I love Doris', '"parser"="english"') | +------------------------------------------------+ -| ["i", "love", "china"] | +| ["i", "love", "doris"] | +------------------------------------------------+ 1 row in set (0.02 sec) diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/index/ngram-bloomfilter-index.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/index/ngram-bloomfilter-index.md index 2aaca37b112..a9564ee82db 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/index/ngram-bloomfilter-index.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/index/ngram-bloomfilter-index.md @@ -45,7 +45,7 @@ NGram BloomFilter 索引只能加速字符串 LIKE 查询,而且 LIKE pattern ::: -## 使用语法 +## 管理索引 ### 创建 NGram BloomFilter 索引 @@ -75,7 +75,11 @@ NGram BloomFilter 索引只能加速字符串 LIKE 查询,而且 LIKE pattern ### 查看 NGram BloomFilter 索引 ```sql -SHOW CREATE TABLE table_ngrambf; +-- 语法 1,表的 schema 中 INDEX 部分 USING NGRAM_BF 是倒排索引 +SHOW CREATE TABLE table_name; + +-- 语法 2,IndexType 为 NGRAM_BF 的是倒排索引 +SHOW INDEX FROM idx_name; ``` ### 删除 NGram BloomFilter 索引 @@ -92,6 +96,17 @@ CREATE INDEX idx_column_name2(column_name2) ON table_ngrambf USING NGRAM_BF PROP ALTER TABLE table_ngrambf ADD INDEX idx_column_name2(column_name2) USING NGRAM_BF PROPERTIES("gram_size"="3", "bf_size"="1024") COMMENT 'username ngram_bf index'; ``` +## 使用索引 + +NGram BloomFilter 索引用于加速 LIKE 查询,比如: +```sql +SELECT count() FROM table1 WHERE message LIKE '%error%'; +``` + +可以通过 Query Profile 中的下面几个指标分析 BloomFilter 索引(包括NGram)的加速效果。 +- RowsBloomFilterFiltered BloomFilter 索引过滤掉的行数,可以与其他几个 Rows 值对比分析索引过滤效果 +- BlockConditionsFilteredBloomFilterTime BloomFilter 倒排索引消耗的时间 + ## 使用示例 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/index/prefix-index.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/index/prefix-index.md index f138bcf35ac..bbebd9c7763 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/index/prefix-index.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/index/prefix-index.md @@ -26,9 +26,9 @@ under the License. ## 索引原理 -Apache Doris 的数据存储在类似 SSTable(Sorted String Table)的数据结构中。该结构是一种有序的数据结构,可以按照指定的一个或多个列进行排序存储。在这种数据结构上,以排序列的全部或者前面几个作为条件进行查找,会非常的高效。 +Doris 的数据存储在类似 SSTable(Sorted String Table)的数据结构中。该结构是一种有序的数据结构,可以按照指定的一个或多个列进行排序存储。在这种数据结构上,以排序列的全部或者前面几个作为条件进行查找,会非常的高效。 -在 Aggregate、Unique 和 Duplicate 三种数据模型中。底层的数据存储,是按照各自建表语句中,Aggregate Key、Unique Key 和 Duplicate Key 中指定的列进行排序存储的。这些 Key,称为排序键(Sort Key)。借助排序键,在查询时,通过给排序列指定条件,Apache Doris 不需要扫描全表即可快速找到需要处理的数据,降低搜索的复杂度,从而加速查询。 +在 Aggregate、Unique 和 Duplicate 三种数据模型中。底层的数据存储,是按照各自建表语句中,Aggregate Key、Unique Key 和 Duplicate Key 中指定的列进行排序存储的。这些 Key,称为排序键(Sort Key)。借助排序键,在查询时,通过给排序列指定条件,Doris 不需要扫描全表即可快速找到需要处理的数据,降低搜索的复杂度,从而加速查询。 在排序键的基础上,又引入了前缀索引(Prefix Index)。前缀索引是一种稀疏索引。表中按照相应的行数的数据构成一个逻辑数据块 (Data Block)。每个逻辑数据块在前缀索引表中存储一个索引项,索引项的长度不超过 36 字节,其内容为数据块中第一行数据的排序列组成的前缀,在查找前缀索引表时可以帮助确定该行数据所在逻辑数据块的起始行号。由于前缀索引比较小,所以,可以全量在内存缓存,快速定位数据块,大大提升了查询效率。 @@ -41,17 +41,33 @@ Apache Doris 的数据存储在类似 SSTable(Sorted String Table)的数据 前缀索引可以加速等值查询和范围查询。 + +## 管理索引 + +前缀索引没有专门的语法去定义,建表时自动取表的 Key 的前 36 字节作为前缀索引。 + +### 前缀索引选择建议 + :::tip -因为一个表的 Key 定义是唯一的,所以一个表只有一种前缀索引。这对于使用其他不能命中前缀索引的列作为条件进行的查询来说,效率上可能无法满足需求,有两种解决方案: +因为一个表的 Key 定义是唯一的,所以一个表只有一组前缀索引,因此设计表结构时选择合适的前缀索引很重要,可以参考下面的建议: +1. 选择查询中最常用于 WHERE 过滤条件的字段作为 Key。 +2. 约常用的字段越放在前面,因为前缀索引只对 WHERE 条件中字段在 Key 的前缀中才有效。 + + +使用其他不能命中前缀索引的列作为条件进行的查询来说,效率上可能无法满足需求,有两种解决方案: 1. 对需要加速查询的条件列创建倒排索引,由于一个表的倒排索引可以有很多个。 2. 对于 Duplicate 表可以通过创建相应的调整了列顺序的单表强一致物化视图来间接实现多种前缀索引,详情可参考查询加速/物化视图。 ::: -## 使用语法 -前缀索引没有专门的语法去定义,建表时自动取表的 Key 的前 36 字节作为前缀索引。 +## 使用索引 + +前缀索引用于加速 WHERE 条件中的等值和范围查询,能加速时自动生效,没有特殊语法。 + +可以通过 Query Profile 中的下面几个指标分析前缀索引的加速效果。 +- RowsKeyRangeFiltered 前缀索引过滤掉的行数,可以与其他几个 Rows 值对比分析索引过滤效果 ## 使用示例 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/table-design/index/bloomfilter.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/table-design/index/bloomfilter.md index f1abc617947..242ec1e95ac 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/table-design/index/bloomfilter.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/table-design/index/bloomfilter.md @@ -49,7 +49,7 @@ Doris BloomFilter 索引以数据块(page)为单位构建,每个数据块 ## 使用场景 -BloomFilter 索引能够对等值查询(包括 = 和 IN)加速,对高基数字段效果较好,比如 userid 等唯一 id 字段。 +BloomFilter 索引能够对等值查询(包括 = 和 IN)加速,对高基数字段效果较好,比如 `userid` 等唯一 ID 字段。 :::tip @@ -70,7 +70,7 @@ BloomFilter 的使用有下面一些限制: ::: -## 使用语法 +## 管理索引 ### 建表时创建 BloomFilter 索引 @@ -102,13 +102,19 @@ ALTER TABLE table_name SET ("bloom_filter_columns" = "column_name1,column_name2, ALTER TABLE table_name SET ("bloom_filter_columns" = "column_name2,column_name3"); ``` +## 使用索引 + +BloomFilter 索引用于加速 WHERE 条件中的等值查询,能加速时自动生效,没有特殊语法。 + +可以通过 Query Profile 中的下面几个指标分析 BloomFilter 索引的加速效果。 +- RowsBloomFilterFiltered BloomFilter 索引过滤掉的行数,可以与其他几个 Rows 值对比分析索引过滤效果 +- BlockConditionsFilteredBloomFilterTime BloomFilter 倒排索引消耗的时间 + ## 使用示例 下面通过实例来看看 Doris 怎么创建 BloomFilter 索引。 -### 创建 BloomFilter 索引 - -Doris BloomFilter 索引的创建是通过在建表语句的 PROPERTIES 里加上 "bloom_filter_columns"="k1,k2,k3" 这个属性,k1,k2,k3 是要创建的 BloomFilter 索引的 Key 列名称,例如下面对表里的 saler_id,category_id 创建了 BloomFilter 索引。 +Doris BloomFilter 索引的创建是通过在建表语句的 PROPERTIES 里加上 "bloom_filter_columns"="k1,k2,k3", 这个属性,k1,k2,k3 是要创建的 BloomFilter 索引的 Key 列名称,例如下面对表里的 saler_id,category_id 创建了 BloomFilter 索引。 ```sql CREATE TABLE IF NOT EXISTS sale_detail_bloom ( diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/table-design/index/inverted-index.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/table-design/index/inverted-index.md index f4828e09a96..00074fa14b3 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/table-design/index/inverted-index.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/table-design/index/inverted-index.md @@ -80,22 +80,13 @@ under the License. 1. 存在精度问题的浮点数类型 FLOAT 和 DOUBLE 不支持倒排索引,原因是浮点数精度不准确。解决方案是使用精度准确的定点数类型 DECIMAL,DECIMAL 支持倒排索引。 -2. 部分复杂数据类型还不支持倒排索引,包括:MAP、STRUCT、JSON、HLL、BITMAP、QUANTILE_STATE、AGG_STATE。其中,自 2.1.0 版本起,JSON 类型可以换成 VARIANT 类型获得支持,可参考 [VARIANT](https://doris.apache.org/docs/sql-manual/sql-types/Data-Types/VARIANT/);MAP、STRUCT 会逐步支持;其他几个类型因为其特殊用途暂不需要支持倒排索引。 +2. 部分复杂数据类型还不支持倒排索引,包括:MAP、STRUCT、JSON、HLL、BITMAP、QUANTILE_STATE、AGG_STATE。 3. DUPLICATE 和 开启 Merge-on-Write 的 UNIQUE 表模型支持任意列建倒排索引。但是 AGGREGATE 和 未开启 Merge-on-Write 的 UNIQUE 模型仅支持 Key 列建倒排索引,非 Key 列不能建倒排索引,这是因为这两个模型需要读取所有数据后做合并,因此不能利用索引做提前过滤。 - -如果要查看某个查询倒排索引效果,可以通过 Query Profile 中的相关指标进行分析。 - -- InvertedIndexFilterTime 是倒排索引消耗的时间 - - InvertedIndexSearcherOpenTime 是倒排索引打开索引的时间 - - InvertedIndexSearcherSearchTime 是倒排索引内部查询的时间 - -- RowsInvertedIndexFiltered 是倒排过滤掉的行数,可以与其他几个 Rows 值对比分析 BloomFilter 索引过滤效果 ::: - -## 使用语法 +## 管理索引 ### 建表时定义倒排索引 @@ -243,9 +234,9 @@ CANCEL BUILD INDEX ON table_name (job_id1,jobid_2,...); :::tip -`BUILD INDEX` 会生成一个异步任务执行,在每个 BE 上有多个线程执行索引删除任务,通过 BE 参数 `alter_index_worker_count` 可以设置,默认值是 3。 +`BUILD INDEX` 会生成一个异步任务执行,在每个 BE 上有多个线程执行索引构建任务,通过 BE 参数 `alter_index_worker_count` 可以设置,默认值是 3。 -2.0.12 之前的版本 `BUILD INDEX` 会一直重试直到成功,从这两个版本开始通过失败和超时机制避免一直重试。 +2.0.12 之前的版本 `BUILD INDEX` 会一直重试直到成功,从这个版本开始通过失败和超时机制避免一直重试。 1. 一个 tablet 的多数副本 `BUILD INDEX` 失败后,整个 `BUILD INDEX` 失败结束 2. 时间超过 `alter_table_timeout_second` (),`BUILD INDEX` 超时结束 @@ -265,10 +256,23 @@ ALTER TABLE table_name DROP INDEX idx_name; :::tip -`DROP INDEX` 会删除索引定义,新写入数据不会再写索引,同时会生成一个异步任务执行索引删除操作,在每个 BE 上有多个线程执行索引构建任务,通过 BE 参数 `alter_index_worker_count` 可以设置,默认值是 3。 +`DROP INDEX` 会删除索引定义,新写入数据不会再写索引,同时会生成一个异步任务执行索引删除操作,在每个 BE 上有多个线程执行索引删除任务,通过 BE 参数 `alter_index_worker_count` 可以设置,默认值是 3。 ::: +### 查看倒排索引 + +```sql +-- 语法 1,表的 schema 中 INDEX 部分 USING INVERTED 是倒排索引 +SHOW CREATE TABLE table_name; + +-- 语法 2,IndexType 为 INVERTED 的是倒排索引 +SHOW INDEX FROM idx_name; +``` + + +## 使用索引 + ### 利用倒排索引加速查询 ```sql @@ -320,7 +324,18 @@ SELECT * FROM table_name WHERE ts > '2023-01-01 00:00:00'; SELECT * FROM table_name WHERE op_type IN ('add', 'delete'); ``` -### 分词函数 +### 通过profile分析索引加速效果 + +倒排查询加速可以通过 session 变量 `enable_inverted_index_query` 开关,默认是 true 打开,有时为了验证索引加速效果可以设置为 false 关闭。 + +可以通过 Query Profile 中的下面几个指标分析倒排索引的加速效果。 +- RowsInvertedIndexFiltered 倒排过滤掉的行数,可以与其他几个 Rows 值对比分析索引过滤效果 +- InvertedIndexFilterTime 倒排索引消耗的时间 + - InvertedIndexSearcherOpenTime 倒排索引打开索引的时间 + - InvertedIndexSearcherSearchTime 倒排索引内部查询的时间 + + +### 用分词函数验证分词效果 如果想检查分词实际效果或者对一段文本进行分词行为,可以使用 TOKENIZE 函数进行验证。 @@ -351,11 +366,11 @@ mysql> SELECT TOKENIZE('武汉市长江大桥','"parser"="chinese","parser_mode" +----------------------------------------------------------------------------------------+ 1 row in set (0.02 sec) -mysql> SELECT TOKENIZE('I love CHINA','"parser"="english"'); +mysql> SELECT TOKENIZE('I love Doris','"parser"="english"'); +------------------------------------------------+ -| tokenize('I love CHINA', '"parser"="english"') | +| tokenize('I love Doris', '"parser"="english"') | +------------------------------------------------+ -| ["i", "love", "china"] | +| ["i", "love", "doris"] | +------------------------------------------------+ 1 row in set (0.02 sec) diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/table-design/index/ngram-bloomfilter-index.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/table-design/index/ngram-bloomfilter-index.md index 2aaca37b112..a9564ee82db 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/table-design/index/ngram-bloomfilter-index.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/table-design/index/ngram-bloomfilter-index.md @@ -45,7 +45,7 @@ NGram BloomFilter 索引只能加速字符串 LIKE 查询,而且 LIKE pattern ::: -## 使用语法 +## 管理索引 ### 创建 NGram BloomFilter 索引 @@ -75,7 +75,11 @@ NGram BloomFilter 索引只能加速字符串 LIKE 查询,而且 LIKE pattern ### 查看 NGram BloomFilter 索引 ```sql -SHOW CREATE TABLE table_ngrambf; +-- 语法 1,表的 schema 中 INDEX 部分 USING NGRAM_BF 是倒排索引 +SHOW CREATE TABLE table_name; + +-- 语法 2,IndexType 为 NGRAM_BF 的是倒排索引 +SHOW INDEX FROM idx_name; ``` ### 删除 NGram BloomFilter 索引 @@ -92,6 +96,17 @@ CREATE INDEX idx_column_name2(column_name2) ON table_ngrambf USING NGRAM_BF PROP ALTER TABLE table_ngrambf ADD INDEX idx_column_name2(column_name2) USING NGRAM_BF PROPERTIES("gram_size"="3", "bf_size"="1024") COMMENT 'username ngram_bf index'; ``` +## 使用索引 + +NGram BloomFilter 索引用于加速 LIKE 查询,比如: +```sql +SELECT count() FROM table1 WHERE message LIKE '%error%'; +``` + +可以通过 Query Profile 中的下面几个指标分析 BloomFilter 索引(包括NGram)的加速效果。 +- RowsBloomFilterFiltered BloomFilter 索引过滤掉的行数,可以与其他几个 Rows 值对比分析索引过滤效果 +- BlockConditionsFilteredBloomFilterTime BloomFilter 倒排索引消耗的时间 + ## 使用示例 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/table-design/index/prefix-index.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/table-design/index/prefix-index.md index c32e8d845c1..bbebd9c7763 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/table-design/index/prefix-index.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/table-design/index/prefix-index.md @@ -26,9 +26,9 @@ under the License. ## 索引原理 -Apache Doris 的数据存储在类似 SSTable(Sorted String Table)的数据结构中。该结构是一种有序的数据结构,可以按照指定的一个或多个列进行排序存储。在这种数据结构上,以排序列的全部或者前面几个作为条件进行查找,会非常的高效。 +Doris 的数据存储在类似 SSTable(Sorted String Table)的数据结构中。该结构是一种有序的数据结构,可以按照指定的一个或多个列进行排序存储。在这种数据结构上,以排序列的全部或者前面几个作为条件进行查找,会非常的高效。 -在 Aggregate、Unique 和 Duplicate 三种数据模型中。底层的数据存储,是按照各自建表语句中,Aggregate Key、Unique Key 和 Duplicate Key 中指定的列进行排序存储的。这些 Key,称为排序键(Sort Key)。借助排序键,在查询时,通过给排序列指定条件,Apache Doris 不需要扫描全表即可快速找到需要处理的数据,降低搜索的复杂度,从而加速查询。 +在 Aggregate、Unique 和 Duplicate 三种数据模型中。底层的数据存储,是按照各自建表语句中,Aggregate Key、Unique Key 和 Duplicate Key 中指定的列进行排序存储的。这些 Key,称为排序键(Sort Key)。借助排序键,在查询时,通过给排序列指定条件,Doris 不需要扫描全表即可快速找到需要处理的数据,降低搜索的复杂度,从而加速查询。 在排序键的基础上,又引入了前缀索引(Prefix Index)。前缀索引是一种稀疏索引。表中按照相应的行数的数据构成一个逻辑数据块 (Data Block)。每个逻辑数据块在前缀索引表中存储一个索引项,索引项的长度不超过 36 字节,其内容为数据块中第一行数据的排序列组成的前缀,在查找前缀索引表时可以帮助确定该行数据所在逻辑数据块的起始行号。由于前缀索引比较小,所以,可以全量在内存缓存,快速定位数据块,大大提升了查询效率。 @@ -41,17 +41,33 @@ Apache Doris 的数据存储在类似 SSTable(Sorted String Table)的数据 前缀索引可以加速等值查询和范围查询。 + +## 管理索引 + +前缀索引没有专门的语法去定义,建表时自动取表的 Key 的前 36 字节作为前缀索引。 + +### 前缀索引选择建议 + :::tip -因为一个表的 Key 定义是唯一的,所以一个表只有一种前缀索引。这对于使用其他不能命中前缀索引的列作为条件进行的查询来说,效率上可能无法满足需求,有两种解决方案: +因为一个表的 Key 定义是唯一的,所以一个表只有一组前缀索引,因此设计表结构时选择合适的前缀索引很重要,可以参考下面的建议: +1. 选择查询中最常用于 WHERE 过滤条件的字段作为 Key。 +2. 约常用的字段越放在前面,因为前缀索引只对 WHERE 条件中字段在 Key 的前缀中才有效。 + + +使用其他不能命中前缀索引的列作为条件进行的查询来说,效率上可能无法满足需求,有两种解决方案: 1. 对需要加速查询的条件列创建倒排索引,由于一个表的倒排索引可以有很多个。 2. 对于 Duplicate 表可以通过创建相应的调整了列顺序的单表强一致物化视图来间接实现多种前缀索引,详情可参考查询加速/物化视图。 ::: -## 使用语法 -前缀索引没有专门的语法去定义,建表时自动取表的 Key 的前 36 字节作为前缀索引。 +## 使用索引 + +前缀索引用于加速 WHERE 条件中的等值和范围查询,能加速时自动生效,没有特殊语法。 + +可以通过 Query Profile 中的下面几个指标分析前缀索引的加速效果。 +- RowsKeyRangeFiltered 前缀索引过滤掉的行数,可以与其他几个 Rows 值对比分析索引过滤效果 ## 使用示例 @@ -89,3 +105,4 @@ SELECT * FROM table WHERE age=20; ``` 所以在建表时,正确选择列顺序,能够极大地提高查询效率。 + diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/table-design/index/bloomfilter.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/table-design/index/bloomfilter.md index 73021a4f56f..242ec1e95ac 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/table-design/index/bloomfilter.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/table-design/index/bloomfilter.md @@ -24,7 +24,6 @@ specific language governing permissions and limitations under the License. --> - ## 索引原理 @@ -50,7 +49,7 @@ Doris BloomFilter 索引以数据块(page)为单位构建,每个数据块 ## 使用场景 -BloomFilter 索引能够对等值查询(包括 = 和 IN)加速,对高基数字段效果较好,比如 userid 等唯一 id 字段。 +BloomFilter 索引能够对等值查询(包括 = 和 IN)加速,对高基数字段效果较好,比如 `userid` 等唯一 ID 字段。 :::tip @@ -71,7 +70,7 @@ BloomFilter 的使用有下面一些限制: ::: -## 使用语法 +## 管理索引 ### 建表时创建 BloomFilter 索引 @@ -103,13 +102,19 @@ ALTER TABLE table_name SET ("bloom_filter_columns" = "column_name1,column_name2, ALTER TABLE table_name SET ("bloom_filter_columns" = "column_name2,column_name3"); ``` +## 使用索引 + +BloomFilter 索引用于加速 WHERE 条件中的等值查询,能加速时自动生效,没有特殊语法。 + +可以通过 Query Profile 中的下面几个指标分析 BloomFilter 索引的加速效果。 +- RowsBloomFilterFiltered BloomFilter 索引过滤掉的行数,可以与其他几个 Rows 值对比分析索引过滤效果 +- BlockConditionsFilteredBloomFilterTime BloomFilter 倒排索引消耗的时间 + ## 使用示例 下面通过实例来看看 Doris 怎么创建 BloomFilter 索引。 -### 创建 BloomFilter 索引 - -Doris BloomFilter 索引的创建是通过在建表语句的 PROPERTIES 里加上 "bloom_filter_columns"="k1,k2,k3" 这个属性,k1,k2,k3 是要创建的 BloomFilter 索引的 Key 列名称,例如下面对表里的 saler_id,category_id 创建了 BloomFilter 索引。 +Doris BloomFilter 索引的创建是通过在建表语句的 PROPERTIES 里加上 "bloom_filter_columns"="k1,k2,k3", 这个属性,k1,k2,k3 是要创建的 BloomFilter 索引的 Key 列名称,例如下面对表里的 saler_id,category_id 创建了 BloomFilter 索引。 ```sql CREATE TABLE IF NOT EXISTS sale_detail_bloom ( diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/table-design/index/inverted-index.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/table-design/index/inverted-index.md index 6315802c463..43e12d40b58 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/table-design/index/inverted-index.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/table-design/index/inverted-index.md @@ -47,7 +47,7 @@ under the License. - 支持短语查询 `MATCH_PHRASE` - 支持指定词距 `slop` - - 支持短语+前缀 `MATCH_PHRASE_PREFIX` + - 支持短语 + 前缀 `MATCH_PHRASE_PREFIX` - 支持分词正则查询 `MATCH_REGEXP` @@ -84,18 +84,9 @@ under the License. 3. DUPLICATE 和 开启 Merge-on-Write 的 UNIQUE 表模型支持任意列建倒排索引。但是 AGGREGATE 和 未开启 Merge-on-Write 的 UNIQUE 模型仅支持 Key 列建倒排索引,非 Key 列不能建倒排索引,这是因为这两个模型需要读取所有数据后做合并,因此不能利用索引做提前过滤。 - -如果要查看某个查询倒排索引效果,可以通过 Query Profile 中的相关指标进行分析。 - -- InvertedIndexFilterTime 是倒排索引消耗的时间 - - InvertedIndexSearcherOpenTime 是倒排索引打开索引的时间 - - InvertedIndexSearcherSearchTime 是倒排索引内部查询的时间 - -- RowsInvertedIndexFiltered 是倒排过滤掉的行数,可以与其他几个 Rows 值对比分析 BloomFilter 索引过滤效果 ::: - -## 使用语法 +## 管理索引 ### 建表时定义倒排索引 @@ -175,7 +166,7 @@ table_properties; <details> <summary>ignore_above</summary> - **用于指定不分词字符串索引(没有指定parser)的长度限制** + **用于指定不分词字符串索引(没有指定 parser)的长度限制** <p>- 长度超过 ignore_above 设置的字符串不会被索引。对于字符串数组,ignore_above 将分别应用于每个数组元素,长度超过 ignore_above 的字符串元素将不被索引。</p> <p>- 默认为 256,单位是字节</p> @@ -243,9 +234,9 @@ CANCEL BUILD INDEX ON table_name (job_id1,jobid_2,...); :::tip -`BUILD INDEX` 会生成一个异步任务执行,在每个 BE 上有多个线程执行索引构建任务,通过 BE 参数 `alter_index_worker_count` 可以设置,默认值是3。 +`BUILD INDEX` 会生成一个异步任务执行,在每个 BE 上有多个线程执行索引构建任务,通过 BE 参数 `alter_index_worker_count` 可以设置,默认值是 3。 -2.1.4 之前的版本 `BUILD INDEX` 会一直重试直到成功,从这两个版本开始通过失败和超时机制避免一直重试。 +2.1.4 之前的版本 `BUILD INDEX` 会一直重试直到成功,从这个版本开始通过失败和超时机制避免一直重试。 1. 一个 tablet 的多数副本 `BUILD INDEX` 失败后,整个 `BUILD INDEX` 失败结束 2. 时间超过 `alter_table_timeout_second` (),`BUILD INDEX` 超时结束 @@ -265,10 +256,23 @@ ALTER TABLE table_name DROP INDEX idx_name; :::tip -`DROP INDEX` 会删除索引定义,新写入数据不会再写索引,同时会生成一个异步任务执行索引删除操作,在每个 BE 上有多个线程执行索引删除任务,通过 BE 参数 `alter_index_worker_count` 可以设置,默认值是3。 +`DROP INDEX` 会删除索引定义,新写入数据不会再写索引,同时会生成一个异步任务执行索引删除操作,在每个 BE 上有多个线程执行索引删除任务,通过 BE 参数 `alter_index_worker_count` 可以设置,默认值是 3。 ::: +### 查看倒排索引 + +```sql +-- 语法 1,表的 schema 中 INDEX 部分 USING INVERTED 是倒排索引 +SHOW CREATE TABLE table_name; + +-- 语法 2,IndexType 为 INVERTED 的是倒排索引 +SHOW INDEX FROM idx_name; +``` + + +## 使用索引 + ### 利用倒排索引加速查询 ```sql @@ -287,30 +291,30 @@ SELECT * FROM table_name WHERE content MATCH_ALL 'keyword1 keyword2'; -- 2. 全文检索短语匹配,通过 MATCH_PHRASE 完成 -- 2.1 content 列中同时包含 keyword1 和 keyword2 的行,而且 keyword2 必须紧跟在 keyword1 后面 --- 'keyword1 keyword2','wordx keyword1 keyword2','wordx keyword1 keyword2 wordy' 能匹配,因为他们都包含keyword1 keyword2,而且keyword2 紧跟在 keyword1 后面 +-- 'keyword1 keyword2','wordx keyword1 keyword2','wordx keyword1 keyword2 wordy' 能匹配,因为他们都包含 keyword1 keyword2,而且 keyword2 紧跟在 keyword1 后面 -- 'keyword1 wordx keyword2' 不能匹配,因为 keyword1 keyword2 之间隔了一个词 wordx -- 'keyword2 keyword1',因为 keyword1 keyword2 的顺序反了 SELECT * FROM table_name WHERE content MATCH_PHRASE 'keyword1 keyword2'; --- 2.2 content 列中同时包含 keyword1 和 keyword2 的行,而且 keyword1 keyword2 的 `词距`(slop) 不超过3 --- 'keyword1 keyword2', 'keyword1 a keyword2', 'keyword1 a b c keyword2' 都能匹配,因为keyword1 keyword2中间隔的词分别是0 1 3 都不超过3 --- 'keyword1 a b c d keyword2' 不能能匹配,因为keyword1 keyword2中间隔的词有4个,超过3 --- 'keyword2 keyword1', 'keyword2 a keyword1', 'keyword2 a b c keyword1' 也能匹配,因为指定 slop > 0 时不再要求keyword1 keyword2 的顺序。这个行为参考了 ES,与直觉的预期不一样,因此 Doris 提供了在 slop 后面指定正数符号(+)表示需要保持 keyword1 keyword2 的先后顺序 +-- 2.2 content 列中同时包含 keyword1 和 keyword2 的行,而且 keyword1 keyword2 的 `词距`(slop)不超过 3 +-- 'keyword1 keyword2', 'keyword1 a keyword2', 'keyword1 a b c keyword2' 都能匹配,因为 keyword1 keyword2 中间隔的词分别是 0 1 3 都不超过 3 +-- 'keyword1 a b c d keyword2' 不能能匹配,因为 keyword1 keyword2 中间隔的词有 4 个,超过 3 +-- 'keyword2 keyword1', 'keyword2 a keyword1', 'keyword2 a b c keyword1' 也能匹配,因为指定 slop > 0 时不再要求 keyword1 keyword2 的顺序。这个行为参考了 ES,与直觉的预期不一样,因此 Doris 提供了在 slop 后面指定正数符号(+)表示需要保持 keyword1 keyword2 的先后顺序 SELECT * FROM table_name WHERE content MATCH_PHRASE 'keyword1 keyword2 ~3'; -- slop 指定正号,'keyword1 a b c keyword2' 能匹配,而 'keyword2 a b c keyword1' 不能匹配 SELECT * FROM table_name WHERE content MATCH_PHRASE 'keyword1 keyword2 ~3+'; --- 2.3 在保持词顺序的前提下,对最后一个词keyword2做前缀匹配,默认找50个前缀词(session变量inverted_index_max_expansions控制) --- 'keyword1 keyword2abc' 能匹配,因为keyword1完全一样,最后一个 keyword2abc 是 keyword2 的前缀 +-- 2.3 在保持词顺序的前提下,对最后一个词 keyword2 做前缀匹配,默认找 50 个前缀词(session 变量 inverted_index_max_expansions 控制) +-- 'keyword1 keyword2abc' 能匹配,因为 keyword1 完全一样,最后一个 keyword2abc 是 keyword2 的前缀 -- 'keyword1 keyword2' 也能匹配,因为 keyword2 也是 keyword2 的前缀 -- 'keyword1 keyword3' 不能匹配,因为 keyword3 不是 keyword2 的前缀 -- 'keyword1 keyword3abc' 也不能匹配,因为 keyword3abc 也不是 keyword2 的前缀 SELECT * FROM table_name WHERE content MATCH_PHRASE_PREFIX 'keyword1 keyword2'; --- 2.4 如果只填一个词会退化为前缀查询,默认找50个前缀词(session变量inverted_index_max_expansions控制) +-- 2.4 如果只填一个词会退化为前缀查询,默认找 50 个前缀词(session 变量 inverted_index_max_expansions 控制) SELECT * FROM table_name WHERE content MATCH_PHRASE_PREFIX 'keyword1'; --- 2.5 对分词后的词进行正则匹配,默认匹配50个(session变量inverted_index_max_expansions控制) +-- 2.5 对分词后的词进行正则匹配,默认匹配 50 个(session 变量 inverted_index_max_expansions 控制) -- 类似 MATCH_PHRASE_PREFIX 的匹配规则,只是前缀变成了正则 SELECT * FROM table_name WHERE content MATCH_REGEXP 'key*'; @@ -320,7 +324,18 @@ SELECT * FROM table_name WHERE ts > '2023-01-01 00:00:00'; SELECT * FROM table_name WHERE op_type IN ('add', 'delete'); ``` -### 分词函数 +### 通过profile分析索引加速效果 + +倒排查询加速可以通过 session 变量 `enable_inverted_index_query` 开关,默认是 true 打开,有时为了验证索引加速效果可以设置为 false 关闭。 + +可以通过 Query Profile 中的下面几个指标分析倒排索引的加速效果。 +- RowsInvertedIndexFiltered 倒排过滤掉的行数,可以与其他几个 Rows 值对比分析索引过滤效果 +- InvertedIndexFilterTime 倒排索引消耗的时间 + - InvertedIndexSearcherOpenTime 倒排索引打开索引的时间 + - InvertedIndexSearcherSearchTime 倒排索引内部查询的时间 + + +### 用分词函数验证分词效果 如果想检查分词实际效果或者对一段文本进行分词行为,可以使用 TOKENIZE 函数进行验证。 @@ -351,11 +366,11 @@ mysql> SELECT TOKENIZE('武汉市长江大桥','"parser"="chinese","parser_mode" +----------------------------------------------------------------------------------------+ 1 row in set (0.02 sec) -mysql> SELECT TOKENIZE('I love CHINA','"parser"="english"'); +mysql> SELECT TOKENIZE('I love Doris','"parser"="english"'); +------------------------------------------------+ -| tokenize('I love CHINA', '"parser"="english"') | +| tokenize('I love Doris', '"parser"="english"') | +------------------------------------------------+ -| ["i", "love", "china"] | +| ["i", "love", "doris"] | +------------------------------------------------+ 1 row in set (0.02 sec) diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/table-design/index/ngram-bloomfilter-index.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/table-design/index/ngram-bloomfilter-index.md index 2aaca37b112..a9564ee82db 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/table-design/index/ngram-bloomfilter-index.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/table-design/index/ngram-bloomfilter-index.md @@ -45,7 +45,7 @@ NGram BloomFilter 索引只能加速字符串 LIKE 查询,而且 LIKE pattern ::: -## 使用语法 +## 管理索引 ### 创建 NGram BloomFilter 索引 @@ -75,7 +75,11 @@ NGram BloomFilter 索引只能加速字符串 LIKE 查询,而且 LIKE pattern ### 查看 NGram BloomFilter 索引 ```sql -SHOW CREATE TABLE table_ngrambf; +-- 语法 1,表的 schema 中 INDEX 部分 USING NGRAM_BF 是倒排索引 +SHOW CREATE TABLE table_name; + +-- 语法 2,IndexType 为 NGRAM_BF 的是倒排索引 +SHOW INDEX FROM idx_name; ``` ### 删除 NGram BloomFilter 索引 @@ -92,6 +96,17 @@ CREATE INDEX idx_column_name2(column_name2) ON table_ngrambf USING NGRAM_BF PROP ALTER TABLE table_ngrambf ADD INDEX idx_column_name2(column_name2) USING NGRAM_BF PROPERTIES("gram_size"="3", "bf_size"="1024") COMMENT 'username ngram_bf index'; ``` +## 使用索引 + +NGram BloomFilter 索引用于加速 LIKE 查询,比如: +```sql +SELECT count() FROM table1 WHERE message LIKE '%error%'; +``` + +可以通过 Query Profile 中的下面几个指标分析 BloomFilter 索引(包括NGram)的加速效果。 +- RowsBloomFilterFiltered BloomFilter 索引过滤掉的行数,可以与其他几个 Rows 值对比分析索引过滤效果 +- BlockConditionsFilteredBloomFilterTime BloomFilter 倒排索引消耗的时间 + ## 使用示例 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/table-design/index/prefix-index.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/table-design/index/prefix-index.md index c32e8d845c1..bbebd9c7763 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/table-design/index/prefix-index.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/table-design/index/prefix-index.md @@ -26,9 +26,9 @@ under the License. ## 索引原理 -Apache Doris 的数据存储在类似 SSTable(Sorted String Table)的数据结构中。该结构是一种有序的数据结构,可以按照指定的一个或多个列进行排序存储。在这种数据结构上,以排序列的全部或者前面几个作为条件进行查找,会非常的高效。 +Doris 的数据存储在类似 SSTable(Sorted String Table)的数据结构中。该结构是一种有序的数据结构,可以按照指定的一个或多个列进行排序存储。在这种数据结构上,以排序列的全部或者前面几个作为条件进行查找,会非常的高效。 -在 Aggregate、Unique 和 Duplicate 三种数据模型中。底层的数据存储,是按照各自建表语句中,Aggregate Key、Unique Key 和 Duplicate Key 中指定的列进行排序存储的。这些 Key,称为排序键(Sort Key)。借助排序键,在查询时,通过给排序列指定条件,Apache Doris 不需要扫描全表即可快速找到需要处理的数据,降低搜索的复杂度,从而加速查询。 +在 Aggregate、Unique 和 Duplicate 三种数据模型中。底层的数据存储,是按照各自建表语句中,Aggregate Key、Unique Key 和 Duplicate Key 中指定的列进行排序存储的。这些 Key,称为排序键(Sort Key)。借助排序键,在查询时,通过给排序列指定条件,Doris 不需要扫描全表即可快速找到需要处理的数据,降低搜索的复杂度,从而加速查询。 在排序键的基础上,又引入了前缀索引(Prefix Index)。前缀索引是一种稀疏索引。表中按照相应的行数的数据构成一个逻辑数据块 (Data Block)。每个逻辑数据块在前缀索引表中存储一个索引项,索引项的长度不超过 36 字节,其内容为数据块中第一行数据的排序列组成的前缀,在查找前缀索引表时可以帮助确定该行数据所在逻辑数据块的起始行号。由于前缀索引比较小,所以,可以全量在内存缓存,快速定位数据块,大大提升了查询效率。 @@ -41,17 +41,33 @@ Apache Doris 的数据存储在类似 SSTable(Sorted String Table)的数据 前缀索引可以加速等值查询和范围查询。 + +## 管理索引 + +前缀索引没有专门的语法去定义,建表时自动取表的 Key 的前 36 字节作为前缀索引。 + +### 前缀索引选择建议 + :::tip -因为一个表的 Key 定义是唯一的,所以一个表只有一种前缀索引。这对于使用其他不能命中前缀索引的列作为条件进行的查询来说,效率上可能无法满足需求,有两种解决方案: +因为一个表的 Key 定义是唯一的,所以一个表只有一组前缀索引,因此设计表结构时选择合适的前缀索引很重要,可以参考下面的建议: +1. 选择查询中最常用于 WHERE 过滤条件的字段作为 Key。 +2. 约常用的字段越放在前面,因为前缀索引只对 WHERE 条件中字段在 Key 的前缀中才有效。 + + +使用其他不能命中前缀索引的列作为条件进行的查询来说,效率上可能无法满足需求,有两种解决方案: 1. 对需要加速查询的条件列创建倒排索引,由于一个表的倒排索引可以有很多个。 2. 对于 Duplicate 表可以通过创建相应的调整了列顺序的单表强一致物化视图来间接实现多种前缀索引,详情可参考查询加速/物化视图。 ::: -## 使用语法 -前缀索引没有专门的语法去定义,建表时自动取表的 Key 的前 36 字节作为前缀索引。 +## 使用索引 + +前缀索引用于加速 WHERE 条件中的等值和范围查询,能加速时自动生效,没有特殊语法。 + +可以通过 Query Profile 中的下面几个指标分析前缀索引的加速效果。 +- RowsKeyRangeFiltered 前缀索引过滤掉的行数,可以与其他几个 Rows 值对比分析索引过滤效果 ## 使用示例 @@ -89,3 +105,4 @@ SELECT * FROM table WHERE age=20; ``` 所以在建表时,正确选择列顺序,能够极大地提高查询效率。 + diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/table-design/index/bloomfilter.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/table-design/index/bloomfilter.md index 5a77d68f27b..242ec1e95ac 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/table-design/index/bloomfilter.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/table-design/index/bloomfilter.md @@ -70,7 +70,7 @@ BloomFilter 的使用有下面一些限制: ::: -## 使用语法 +## 管理索引 ### 建表时创建 BloomFilter 索引 @@ -102,12 +102,18 @@ ALTER TABLE table_name SET ("bloom_filter_columns" = "column_name1,column_name2, ALTER TABLE table_name SET ("bloom_filter_columns" = "column_name2,column_name3"); ``` +## 使用索引 + +BloomFilter 索引用于加速 WHERE 条件中的等值查询,能加速时自动生效,没有特殊语法。 + +可以通过 Query Profile 中的下面几个指标分析 BloomFilter 索引的加速效果。 +- RowsBloomFilterFiltered BloomFilter 索引过滤掉的行数,可以与其他几个 Rows 值对比分析索引过滤效果 +- BlockConditionsFilteredBloomFilterTime BloomFilter 倒排索引消耗的时间 + ## 使用示例 下面通过实例来看看 Doris 怎么创建 BloomFilter 索引。 -### 创建 BloomFilter 索引 - Doris BloomFilter 索引的创建是通过在建表语句的 PROPERTIES 里加上 "bloom_filter_columns"="k1,k2,k3", 这个属性,k1,k2,k3 是要创建的 BloomFilter 索引的 Key 列名称,例如下面对表里的 saler_id,category_id 创建了 BloomFilter 索引。 ```sql diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/table-design/index/inverted-index.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/table-design/index/inverted-index.md index 672a13343ff..98dc525b983 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/table-design/index/inverted-index.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/table-design/index/inverted-index.md @@ -84,18 +84,9 @@ under the License. 3. DUPLICATE 和 开启 Merge-on-Write 的 UNIQUE 表模型支持任意列建倒排索引。但是 AGGREGATE 和 未开启 Merge-on-Write 的 UNIQUE 模型仅支持 Key 列建倒排索引,非 Key 列不能建倒排索引,这是因为这两个模型需要读取所有数据后做合并,因此不能利用索引做提前过滤。 - -如果要查看某个查询倒排索引效果,可以通过 Query Profile 中的相关指标进行分析。 - -- InvertedIndexFilterTime 是倒排索引消耗的时间 - - InvertedIndexSearcherOpenTime 是倒排索引打开索引的时间 - - InvertedIndexSearcherSearchTime 是倒排索引内部查询的时间 - -- RowsInvertedIndexFiltered 是倒排过滤掉的行数,可以与其他几个 Rows 值对比分析 BloomFilter 索引过滤效果 ::: - -## 使用语法 +## 管理索引 ### 建表时定义倒排索引 @@ -269,6 +260,19 @@ ALTER TABLE table_name DROP INDEX idx_name; ::: +### 查看倒排索引 + +```sql +-- 语法 1,表的 schema 中 INDEX 部分 USING INVERTED 是倒排索引 +SHOW CREATE TABLE table_name; + +-- 语法 2,IndexType 为 INVERTED 的是倒排索引 +SHOW INDEX FROM idx_name; +``` + + +## 使用索引 + ### 利用倒排索引加速查询 ```sql @@ -320,7 +324,18 @@ SELECT * FROM table_name WHERE ts > '2023-01-01 00:00:00'; SELECT * FROM table_name WHERE op_type IN ('add', 'delete'); ``` -### 分词函数 +### 通过profile分析索引加速效果 + +倒排查询加速可以通过 session 变量 `enable_inverted_index_query` 开关,默认是 true 打开,有时为了验证索引加速效果可以设置为 false 关闭。 + +可以通过 Query Profile 中的下面几个指标分析倒排索引的加速效果。 +- RowsInvertedIndexFiltered 倒排过滤掉的行数,可以与其他几个 Rows 值对比分析索引过滤效果 +- InvertedIndexFilterTime 倒排索引消耗的时间 + - InvertedIndexSearcherOpenTime 倒排索引打开索引的时间 + - InvertedIndexSearcherSearchTime 倒排索引内部查询的时间 + + +### 用分词函数验证分词效果 如果想检查分词实际效果或者对一段文本进行分词行为,可以使用 TOKENIZE 函数进行验证。 @@ -351,11 +366,11 @@ mysql> SELECT TOKENIZE('武汉市长江大桥','"parser"="chinese","parser_mode" +----------------------------------------------------------------------------------------+ 1 row in set (0.02 sec) -mysql> SELECT TOKENIZE('I love CHINA','"parser"="english"'); +mysql> SELECT TOKENIZE('I love Doris','"parser"="english"'); +------------------------------------------------+ -| tokenize('I love CHINA', '"parser"="english"') | +| tokenize('I love Doris', '"parser"="english"') | +------------------------------------------------+ -| ["i", "love", "china"] | +| ["i", "love", "doris"] | +------------------------------------------------+ 1 row in set (0.02 sec) @@ -415,8 +430,7 @@ PROPERTIES ("replication_num" = "1"); **通过 Stream Load 导入数据** ``` - -wget https://doris-build-1308700295.cos.ap-beijing.myqcloud.com/regression/index/hacknernews_1m.csv.gz +wget https://qa-build.oss-cn-beijing.aliyuncs.com/regression/index/hacknernews_1m.csv.gz curl --location-trusted -u root: -H "compress_type:gz" -T hacknernews_1m.csv.gz http://127.0.0.1:8030/api/test_inverted_index/hackernews_1m/_stream_load { diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/table-design/index/ngram-bloomfilter-index.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/table-design/index/ngram-bloomfilter-index.md index 2aaca37b112..a9564ee82db 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/table-design/index/ngram-bloomfilter-index.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/table-design/index/ngram-bloomfilter-index.md @@ -45,7 +45,7 @@ NGram BloomFilter 索引只能加速字符串 LIKE 查询,而且 LIKE pattern ::: -## 使用语法 +## 管理索引 ### 创建 NGram BloomFilter 索引 @@ -75,7 +75,11 @@ NGram BloomFilter 索引只能加速字符串 LIKE 查询,而且 LIKE pattern ### 查看 NGram BloomFilter 索引 ```sql -SHOW CREATE TABLE table_ngrambf; +-- 语法 1,表的 schema 中 INDEX 部分 USING NGRAM_BF 是倒排索引 +SHOW CREATE TABLE table_name; + +-- 语法 2,IndexType 为 NGRAM_BF 的是倒排索引 +SHOW INDEX FROM idx_name; ``` ### 删除 NGram BloomFilter 索引 @@ -92,6 +96,17 @@ CREATE INDEX idx_column_name2(column_name2) ON table_ngrambf USING NGRAM_BF PROP ALTER TABLE table_ngrambf ADD INDEX idx_column_name2(column_name2) USING NGRAM_BF PROPERTIES("gram_size"="3", "bf_size"="1024") COMMENT 'username ngram_bf index'; ``` +## 使用索引 + +NGram BloomFilter 索引用于加速 LIKE 查询,比如: +```sql +SELECT count() FROM table1 WHERE message LIKE '%error%'; +``` + +可以通过 Query Profile 中的下面几个指标分析 BloomFilter 索引(包括NGram)的加速效果。 +- RowsBloomFilterFiltered BloomFilter 索引过滤掉的行数,可以与其他几个 Rows 值对比分析索引过滤效果 +- BlockConditionsFilteredBloomFilterTime BloomFilter 倒排索引消耗的时间 + ## 使用示例 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/table-design/index/prefix-index.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/table-design/index/prefix-index.md index f138bcf35ac..bbebd9c7763 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/table-design/index/prefix-index.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/table-design/index/prefix-index.md @@ -26,9 +26,9 @@ under the License. ## 索引原理 -Apache Doris 的数据存储在类似 SSTable(Sorted String Table)的数据结构中。该结构是一种有序的数据结构,可以按照指定的一个或多个列进行排序存储。在这种数据结构上,以排序列的全部或者前面几个作为条件进行查找,会非常的高效。 +Doris 的数据存储在类似 SSTable(Sorted String Table)的数据结构中。该结构是一种有序的数据结构,可以按照指定的一个或多个列进行排序存储。在这种数据结构上,以排序列的全部或者前面几个作为条件进行查找,会非常的高效。 -在 Aggregate、Unique 和 Duplicate 三种数据模型中。底层的数据存储,是按照各自建表语句中,Aggregate Key、Unique Key 和 Duplicate Key 中指定的列进行排序存储的。这些 Key,称为排序键(Sort Key)。借助排序键,在查询时,通过给排序列指定条件,Apache Doris 不需要扫描全表即可快速找到需要处理的数据,降低搜索的复杂度,从而加速查询。 +在 Aggregate、Unique 和 Duplicate 三种数据模型中。底层的数据存储,是按照各自建表语句中,Aggregate Key、Unique Key 和 Duplicate Key 中指定的列进行排序存储的。这些 Key,称为排序键(Sort Key)。借助排序键,在查询时,通过给排序列指定条件,Doris 不需要扫描全表即可快速找到需要处理的数据,降低搜索的复杂度,从而加速查询。 在排序键的基础上,又引入了前缀索引(Prefix Index)。前缀索引是一种稀疏索引。表中按照相应的行数的数据构成一个逻辑数据块 (Data Block)。每个逻辑数据块在前缀索引表中存储一个索引项,索引项的长度不超过 36 字节,其内容为数据块中第一行数据的排序列组成的前缀,在查找前缀索引表时可以帮助确定该行数据所在逻辑数据块的起始行号。由于前缀索引比较小,所以,可以全量在内存缓存,快速定位数据块,大大提升了查询效率。 @@ -41,17 +41,33 @@ Apache Doris 的数据存储在类似 SSTable(Sorted String Table)的数据 前缀索引可以加速等值查询和范围查询。 + +## 管理索引 + +前缀索引没有专门的语法去定义,建表时自动取表的 Key 的前 36 字节作为前缀索引。 + +### 前缀索引选择建议 + :::tip -因为一个表的 Key 定义是唯一的,所以一个表只有一种前缀索引。这对于使用其他不能命中前缀索引的列作为条件进行的查询来说,效率上可能无法满足需求,有两种解决方案: +因为一个表的 Key 定义是唯一的,所以一个表只有一组前缀索引,因此设计表结构时选择合适的前缀索引很重要,可以参考下面的建议: +1. 选择查询中最常用于 WHERE 过滤条件的字段作为 Key。 +2. 约常用的字段越放在前面,因为前缀索引只对 WHERE 条件中字段在 Key 的前缀中才有效。 + + +使用其他不能命中前缀索引的列作为条件进行的查询来说,效率上可能无法满足需求,有两种解决方案: 1. 对需要加速查询的条件列创建倒排索引,由于一个表的倒排索引可以有很多个。 2. 对于 Duplicate 表可以通过创建相应的调整了列顺序的单表强一致物化视图来间接实现多种前缀索引,详情可参考查询加速/物化视图。 ::: -## 使用语法 -前缀索引没有专门的语法去定义,建表时自动取表的 Key 的前 36 字节作为前缀索引。 +## 使用索引 + +前缀索引用于加速 WHERE 条件中的等值和范围查询,能加速时自动生效,没有特殊语法。 + +可以通过 Query Profile 中的下面几个指标分析前缀索引的加速效果。 +- RowsKeyRangeFiltered 前缀索引过滤掉的行数,可以与其他几个 Rows 值对比分析索引过滤效果 ## 使用示例 diff --git a/versioned_docs/version-2.0/table-design/index/bloomfilter.md b/versioned_docs/version-2.0/table-design/index/bloomfilter.md index bbf569a6cd2..49094e3b853 100644 --- a/versioned_docs/version-2.0/table-design/index/bloomfilter.md +++ b/versioned_docs/version-2.0/table-design/index/bloomfilter.md @@ -27,7 +27,7 @@ under the License. ## Indexing Principles -The BloomFilter index is a type of skip list index based on BloomFilter. Its principle is to use BloomFilter to skip data blocks that do not meet the specified conditions for equality queries, thereby reducing IO and accelerating queries. +The BloomFilter index is a type of skip index based on BloomFilter. Its principle is to use BloomFilter to skip data blocks that do not meet the specified conditions for equality queries, thereby reducing IO and accelerating queries. BloomFilter is a fast lookup algorithm proposed by Bloom in 1970, which uses multiple hash functions. It is commonly used in scenarios where quick determination of whether an element belongs to a set is needed without requiring 100% accuracy. BloomFilter has the following characteristics: @@ -63,7 +63,7 @@ To check the effect of a BloomFilter index on a query, you can analyze relevant ::: -## Syntax +## Managing Indexes ### Creating a BloomFilter Index When Creating a Table @@ -97,12 +97,18 @@ ALTER TABLE table_name SET ("bloom_filter_columns" = "column_name1,column_name2, ALTER TABLE table_name SET ("bloom_filter_columns" = "column_name2,column_name3"); ``` +## Using Indexes + +BloomFilter index is used to accelerate equality queries in the WHERE clause. It automatically takes effect when applicable, and there is no special syntax required. + +The acceleration effect of the BloomFilter index can be analyzed using the following metrics in the Query Profile: +- RowsBloomFilterFiltered: The number of rows filtered by the BloomFilter index, which can be compared with other Rows values to analyze the filtering effect of the index. +- BlockConditionsFilteredBloomFilterTime: The time consumed by the BloomFilter inverted index. + ## Usage Example Here is an example of how to create a BloomFilter index in Doris. -### Creating a BloomFilter Index - The BloomFilter index in Doris is created by adding the "bloom_filter_columns" property in the CREATE TABLE statement, with k1, k2, k3 being the key columns for the BloomFilter index. For example, the following creates BloomFilter indexes on saler_id and category_id. ```sql diff --git a/versioned_docs/version-2.0/table-design/index/inverted-index.md b/versioned_docs/version-2.0/table-design/index/inverted-index.md index 4524bc05dd4..11fe799e9e8 100644 --- a/versioned_docs/version-2.0/table-design/index/inverted-index.md +++ b/versioned_docs/version-2.0/table-design/index/inverted-index.md @@ -79,22 +79,15 @@ There are some limitations to using inverted indexes: 1. Floating-point types FLOAT and DOUBLE, which have precision issues, do not support inverted indexes due to inaccurate precision. The solution is to use the precisely accurate DECIMAL type, which supports inverted indexes. -2. Some complex data types do not yet support inverted indexes, including MAP, STRUCT, JSON, HLL, BITMAP, QUANTILE_STATE, AGG_STATE. Among these data types, JSON can be replaced with the VARIANT type since Apache Doris 2.1.0 version (for more information, see [VARIANT](https://doris.apache.org/docs/sql-manual/sql-types/Data-Types/VARIANT)), MAP and STRUCT will gradually gain support, and the other types do not need support for inverted indexes due to their specific uses. +2. Some complex data types do not yet support inverted indexes, including MAP, STRUCT, JSON, HLL, BITMAP, QUANTILE_STATE, AGG_STATE. 3. DUPLICATE and UNIQUE table models with Merge-on-Write enabled support building inverted indexes on any column. However, AGGREGATE and UNIQUE models without Merge-on-Write enabled only support building inverted indexes on Key columns, as non-Key columns cannot have inverted indexes. This is because these two models require reading all data for merging, so indexes cannot be used for pre-filtering. -To see the effect of inverted indexes on a query, you can analyze relevant metrics in the Query Profile. - -- InvertedIndexFilterTime: time consumed by the inverted index - - InvertedIndexSearcherOpenTime: time to open the inverted index - - InvertedIndexSearcherSearchTime: time for internal queries of the inverted index - -- RowsInvertedIndexFiltered: number of rows filtered by the inverted index, can be compared with other Rows values to analyze the filtering effect of the BloomFilter index ::: -## Syntax +## Managing Indexes -### Define Inverted Indexes When Creating a Table +### Defining Inverted Indexes When Creating a Table In the table creation statement, after the COLUMN definition, is the index definition: @@ -184,7 +177,7 @@ Syntax explanation: **Whether to convert tokens to lowercase for case-insensitive matching** <p>- true: convert to lowercase</p> <p>- false: do not convert to lowercase</p> - <p>- From version 2.0.7, the default is true, automatically converting to lowercase. Earlier versions default to false.</p> + <p>- From versions 2.0.7, the default is true, automatically converting to lowercase. Earlier versions default to false.</p> </details> <details> @@ -242,7 +235,7 @@ CANCEL BUILD INDEX ON table_name (job_id1, job_id2, ...); `BUILD INDEX` creates an asynchronous task executed by multiple threads on each BE. The number of threads can be set using the BE config `alter_index_worker_count`, with a default value of 3. -In versions before 2.0.12, `BUILD INDEX` would keep retrying until it succeeded. Starting from these versions, failure and timeout mechanisms prevent endless retries. +In versions before 2.0.12, `BUILD INDEX` would keep retrying until it succeeded. Starting from this version, failure and timeout mechanisms prevent endless retries. 1. If the majority of replicas for a tablet fail to `BUILD INDEX`, the entire `BUILD INDEX` operation fails. 2. If the time exceeds `alter_table_timeout_second`, the `BUILD INDEX` operation times out. @@ -265,6 +258,16 @@ ALTER TABLE table_name DROP INDEX idx_name; ::: +### Viewing Inverted Index + +-- Syntax 1: The INDEX section in the table schema with USING INVERTED indicates an inverted index +SHOW CREATE TABLE table_name; + +-- Syntax 2: IndexType as INVERTED indicates an inverted index +SHOW INDEX FROM idx_name; + +## Using Indexes + ### Accelerating Queries with Inverted Indexes ```sql @@ -320,27 +323,31 @@ SELECT * FROM table_name WHERE ts > '2023-01-01 00:00:00'; SELECT * FROM table_name WHERE op_type IN ('add', 'delete'); ``` -### TOKENIZE Function +### Analyzing Index Acceleration Effects Through Profiles + +Inverted query acceleration can be toggled using the session variable `enable_inverted_index_query`, which is set to true by default. To verify the acceleration effect of the index, it can be set to false to turn it off. + +The acceleration effect of the inverted index can be analyzed using the following metrics in the Query Profile: +- RowsInvertedIndexFiltered: The number of rows filtered by the inverted index, which can be compared with other Rows values to analyze the filtering effect of the index. +- InvertedIndexFilterTime: The time consumed by the inverted index. + - InvertedIndexSearcherOpenTime: The time taken to open the inverted index. + - InvertedIndexSearcherSearchTime: The time taken for internal queries within the inverted index. + + +### Verifying Tokenization Effects Using Tokenization Functions To check the actual effect of tokenization or to tokenize a piece of text, you can use the `TOKENIZE` function for verification. The first parameter of the `TOKENIZE` function is the text to be tokenized, and the second parameter specifies the tokenization parameters used when creating the index. -mysql> SELECT TOKENIZE('I love CHINA','"parser"="english"'); +mysql> SELECT TOKENIZE('I love Doris','"parser"="english"'); +------------------------------------------------+ -| tokenize('I love CHINA', '"parser"="english"') | +| tokenize('I love Doris', '"parser"="english"') | +------------------------------------------------+ -| ["i", "love", "china"] | +| ["i", "love", "doris"] | +------------------------------------------------+ 1 row in set (0.02 sec) -mysql> SELECT TOKENIZE('I love CHINA 我爱我的祖国','"parser"="unicode"'); -+-------------------------------------------------------------------+ -| tokenize('I love CHINA 我爱我的祖国', '"parser"="unicode"') | -+-------------------------------------------------------------------+ -| ["i", "love", "china", "我", "爱", "我", "的", "祖", "国"] | -+-------------------------------------------------------------------+ -1 row in set (0.02 sec) ``` ## Usage Example diff --git a/versioned_docs/version-2.0/table-design/index/ngram-bloomfilter-index.md b/versioned_docs/version-2.0/table-design/index/ngram-bloomfilter-index.md index 71368ead8c0..be1a3a098c1 100644 --- a/versioned_docs/version-2.0/table-design/index/ngram-bloomfilter-index.md +++ b/versioned_docs/version-2.0/table-design/index/ngram-bloomfilter-index.md @@ -27,7 +27,7 @@ under the License. ## Indexing Principles -The NGram BloomFilter index, similar to the BloomFilter index, is a skip list index based on BloomFilter. +The NGram BloomFilter index, similar to the BloomFilter index, is a skip index based on BloomFilter. Unlike the BloomFilter index, the NGram BloomFilter index is used to accelerate text LIKE queries. Instead of storing the original text values, it tokenizes the text using NGram and stores each token in the BloomFilter. For LIKE queries, the pattern in LIKE '%pattern%' is also tokenized using NGram. Each token is checked against the BloomFilter, and if any token is not found, the corresponding data block does not meet the LIKE condition and can be skipped, reducing IO and accelerating th [...] @@ -43,7 +43,7 @@ The NGram BloomFilter index can only accelerate string LIKE queries, and the num ::: -## Syntax +## Managing Indexes ### Creating an NGram BloomFilter Index @@ -67,8 +67,14 @@ Explanation of the syntax: ### Viewing NGram BloomFilter Index +-- Syntax 1: The INDEX section in the table schema with USING NGRAM_BF indicates an inverted index ```sql -SHOW CREATE TABLE table_ngrambf; +SHOW CREATE TABLE table_name; +``` + +-- Syntax 2: IndexType as NGRAM_BF indicates an inverted index +```sql +SHOW INDEX FROM idx_name; ``` ### Deleting an NGram BloomFilter Index @@ -85,6 +91,15 @@ CREATE INDEX idx_column_name2(column_name2) ON table_ngrambf USING NGRAM_BF PROP ALTER TABLE table_ngrambf ADD INDEX idx_column_name2(column_name2) USING NGRAM_BF PROPERTIES("gram_size"="3", "bf_size"="1024") COMMENT 'username ngram_bf index'; ``` +## Using Indexes + +NGram BloomFilter index is used to accelerate LIKE queries, for example: +SELECT count() FROM table1 WHERE message LIKE '%error%'; + +The acceleration effect of the BloomFilter index (including NGram) can be analyzed using the following metrics in the Query Profile: +- RowsBloomFilterFiltered: The number of rows filtered by the BloomFilter index, which can be compared with other Rows values to analyze the filtering effect of the index. +- BlockConditionsFilteredBloomFilterTime: The time consumed by the BloomFilter inverted index. + ## Usage Example This section demonstrates the usage and effectiveness of the NGram BloomFilter index using a dataset of Amazon product reviews, `amazon_reviews`. diff --git a/versioned_docs/version-2.0/table-design/index/prefix-index.md b/versioned_docs/version-2.0/table-design/index/prefix-index.md index 7801c9fce61..74d2e26ffb4 100644 --- a/versioned_docs/version-2.0/table-design/index/prefix-index.md +++ b/versioned_docs/version-2.0/table-design/index/prefix-index.md @@ -42,17 +42,30 @@ The first 36 bytes of a row in a data block are used as the prefix index for tha Prefix indexes can speed up equality queries and range queries. +## Managing Indexes + +There is no specific syntax to define a prefix index. When creating a table, the first 36 bytes of the table's Key are automatically taken as the prefix index. + +### Recommendations for Prefix Index Selection + :::tip -Since the KEY definition of a table is unique, a table can only have one type of prefix index. For queries using other columns that cannot hit the prefix index as conditions, the efficiency might not meet the requirements. There are two solutions: +Because the Key definition of a table is unique, a table has only one set of prefix indexes. Therefore, it is important to choose an appropriate prefix index when designing the table structure. The following recommendations can be considered: +1. Choose the fields most commonly used in WHERE filtering conditions as the Key. +2. Place the more frequently used fields at the front, as prefix indexes are only effective for fields in the WHERE condition that are part of the Key's prefix. + +For queries that use other columns not covered by the prefix index as conditions, the efficiency may not meet the requirements. There are two solutions: 1. Create an inverted index on the columns that require accelerated queries, as a table can have many inverted indexes. 2. For DUPLICATE tables, multi-prefix indexes can be indirectly achieved by creating corresponding strongly consistent materialized views with adjusted column orders. For more details, refer to query acceleration/materialized views. ::: -## Syntax +## Using Indexes + +Prefix indexes are used to accelerate equality and range queries in the WHERE clause. They automatically take effect when applicable, and there is no special syntax required. -There is no specific syntax for defining a prefix index. When creating a table, the first 36 bytes of the table's KEY are automatically taken as the prefix index. +The acceleration effect of the prefix index can be analyzed using the following metrics in the Query Profile: +- RowsKeyRangeFiltered: The number of rows filtered by the prefix index, which can be compared with other Rows values to analyze the filtering effect of the index. ## Example Usage diff --git a/versioned_docs/version-2.1/table-design/index/bloomfilter.md b/versioned_docs/version-2.1/table-design/index/bloomfilter.md index bbf569a6cd2..49094e3b853 100644 --- a/versioned_docs/version-2.1/table-design/index/bloomfilter.md +++ b/versioned_docs/version-2.1/table-design/index/bloomfilter.md @@ -27,7 +27,7 @@ under the License. ## Indexing Principles -The BloomFilter index is a type of skip list index based on BloomFilter. Its principle is to use BloomFilter to skip data blocks that do not meet the specified conditions for equality queries, thereby reducing IO and accelerating queries. +The BloomFilter index is a type of skip index based on BloomFilter. Its principle is to use BloomFilter to skip data blocks that do not meet the specified conditions for equality queries, thereby reducing IO and accelerating queries. BloomFilter is a fast lookup algorithm proposed by Bloom in 1970, which uses multiple hash functions. It is commonly used in scenarios where quick determination of whether an element belongs to a set is needed without requiring 100% accuracy. BloomFilter has the following characteristics: @@ -63,7 +63,7 @@ To check the effect of a BloomFilter index on a query, you can analyze relevant ::: -## Syntax +## Managing Indexes ### Creating a BloomFilter Index When Creating a Table @@ -97,12 +97,18 @@ ALTER TABLE table_name SET ("bloom_filter_columns" = "column_name1,column_name2, ALTER TABLE table_name SET ("bloom_filter_columns" = "column_name2,column_name3"); ``` +## Using Indexes + +BloomFilter index is used to accelerate equality queries in the WHERE clause. It automatically takes effect when applicable, and there is no special syntax required. + +The acceleration effect of the BloomFilter index can be analyzed using the following metrics in the Query Profile: +- RowsBloomFilterFiltered: The number of rows filtered by the BloomFilter index, which can be compared with other Rows values to analyze the filtering effect of the index. +- BlockConditionsFilteredBloomFilterTime: The time consumed by the BloomFilter inverted index. + ## Usage Example Here is an example of how to create a BloomFilter index in Doris. -### Creating a BloomFilter Index - The BloomFilter index in Doris is created by adding the "bloom_filter_columns" property in the CREATE TABLE statement, with k1, k2, k3 being the key columns for the BloomFilter index. For example, the following creates BloomFilter indexes on saler_id and category_id. ```sql diff --git a/versioned_docs/version-2.1/table-design/index/inverted-index.md b/versioned_docs/version-2.1/table-design/index/inverted-index.md index 32f657803a5..fae75d68110 100644 --- a/versioned_docs/version-2.1/table-design/index/inverted-index.md +++ b/versioned_docs/version-2.1/table-design/index/inverted-index.md @@ -76,22 +76,15 @@ There are some limitations to using inverted indexes: 1. Floating-point types FLOAT and DOUBLE, which have precision issues, do not support inverted indexes due to inaccurate precision. The solution is to use the precisely accurate DECIMAL type, which supports inverted indexes. -2. Some complex data types do not yet support inverted indexes, including MAP, STRUCT, JSON, HLL, BITMAP, QUANTILE_STATE, AGG_STATE. MAP and STRUCT will gradually gain support, and JSON can be replaced with the VARIANT type for support. The other types do not need support for inverted indexes due to their specific uses. +2. Some complex data types do not yet support inverted indexes, including MAP, STRUCT, JSON, HLL, BITMAP, QUANTILE_STATE, AGG_STATE. 3. DUPLICATE and UNIQUE table models with Merge-on-Write enabled support building inverted indexes on any column. However, AGGREGATE and UNIQUE models without Merge-on-Write enabled only support building inverted indexes on Key columns, as non-Key columns cannot have inverted indexes. This is because these two models require reading all data for merging, so indexes cannot be used for pre-filtering. -To see the effect of inverted indexes on a query, you can analyze relevant metrics in the Query Profile. - -- InvertedIndexFilterTime: time consumed by the inverted index - - InvertedIndexSearcherOpenTime: time to open the inverted index - - InvertedIndexSearcherSearchTime: time for internal queries of the inverted index - -- RowsInvertedIndexFiltered: number of rows filtered by the inverted index, can be compared with other Rows values to analyze the filtering effect of the BloomFilter index ::: -## Syntax +## Managing Indexes -### Define Inverted Indexes When Creating a Table +### Defining Inverted Indexes When Creating a Table In the table creation statement, after the COLUMN definition, is the index definition: @@ -239,7 +232,7 @@ CANCEL BUILD INDEX ON table_name (job_id1, job_id2, ...); `BUILD INDEX` creates an asynchronous task executed by multiple threads on each BE. The number of threads can be set using the BE config `alter_index_worker_count`, with a default value of 3. -In versions before 2.1.4, `BUILD INDEX` would keep retrying until it succeeded. Starting from these versions, failure and timeout mechanisms prevent endless retries. +In versions before and 2.1.4, `BUILD INDEX` would keep retrying until it succeeded. Starting from this version, failure and timeout mechanisms prevent endless retries. 1. If the majority of replicas for a tablet fail to `BUILD INDEX`, the entire `BUILD INDEX` operation fails. 2. If the time exceeds `alter_table_timeout_second`, the `BUILD INDEX` operation times out. @@ -262,6 +255,16 @@ ALTER TABLE table_name DROP INDEX idx_name; ::: +### Viewing Inverted Index + +-- Syntax 1: The INDEX section in the table schema with USING INVERTED indicates an inverted index +SHOW CREATE TABLE table_name; + +-- Syntax 2: IndexType as INVERTED indicates an inverted index +SHOW INDEX FROM idx_name; + +## Using Indexes + ### Accelerating Queries with Inverted Indexes ```sql @@ -317,28 +320,31 @@ SELECT * FROM table_name WHERE ts > '2023-01-01 00:00:00'; SELECT * FROM table_name WHERE op_type IN ('add', 'delete'); ``` -### TOKENIZE Function +### Analyzing Index Acceleration Effects Through Profiles + +Inverted query acceleration can be toggled using the session variable `enable_inverted_index_query`, which is set to true by default. To verify the acceleration effect of the index, it can be set to false to turn it off. + +The acceleration effect of the inverted index can be analyzed using the following metrics in the Query Profile: +- RowsInvertedIndexFiltered: The number of rows filtered by the inverted index, which can be compared with other Rows values to analyze the filtering effect of the index. +- InvertedIndexFilterTime: The time consumed by the inverted index. + - InvertedIndexSearcherOpenTime: The time taken to open the inverted index. + - InvertedIndexSearcherSearchTime: The time taken for internal queries within the inverted index. + + +### Verifying Tokenization Effects Using Tokenization Functions To check the actual effect of tokenization or to tokenize a piece of text, you can use the `TOKENIZE` function for verification. The first parameter of the `TOKENIZE` function is the text to be tokenized, and the second parameter specifies the tokenization parameters used when creating the index. -```sql -mysql> SELECT TOKENIZE('I love CHINA','"parser"="english"'); +mysql> SELECT TOKENIZE('I love Doris','"parser"="english"'); +------------------------------------------------+ -| tokenize('I love CHINA', '"parser"="english"') | +| tokenize('I love Doris', '"parser"="english"') | +------------------------------------------------+ -| ["i", "love", "china"] | +| ["i", "love", "doris"] | +------------------------------------------------+ 1 row in set (0.02 sec) -mysql> SELECT TOKENIZE('I love CHINA 我爱我的祖国','"parser"="unicode"'); -+-------------------------------------------------------------------+ -| tokenize('I love CHINA 我爱我的祖国', '"parser"="unicode"') | -+-------------------------------------------------------------------+ -| ["i", "love", "china", "我", "爱", "我", "的", "祖", "国"] | -+-------------------------------------------------------------------+ -1 row in set (0.02 sec) ``` ## Usage Example @@ -384,7 +390,7 @@ PROPERTIES ("replication_num" = "1"); **Importing Data via Stream Load** -```json +``` wget https://qa-build.oss-cn-beijing.aliyuncs.com/regression/index/hacknernews_1m.csv.gz curl --location-trusted -u root: -H "compress_type:gz" -T hacknernews_1m.csv.gz http://127.0.0.1:8030/api/test_inverted_index/hackernews_1m/_stream_load @@ -640,41 +646,39 @@ mysql> SELECT count() FROM hackernews_1m; -- Execute BUILD INDEX to add the inverted index for existing data mysql> BUILD INDEX idx_author ON hackernews_1m; Query OK, 0 rows affected (0.01 sec) - ``` - -- Creating an incremental index for 1 million author records took only 1.5 seconds. - - ```sql - mysql> SHOW ALTER TABLE COLUMN; - +-------+---------------+-------------------------+-------------------------+---------------+---------+---------------+---------------+---------------+----------+------+----------+---------+ - | JobId | TableName | CreateTime | FinishTime | IndexName | IndexId | OriginIndexId | SchemaVersion | TransactionId | State | Msg | Progress | Timeout | - +-------+---------------+-------------------------+-------------------------+---------------+---------+---------------+---------------+---------------+----------+------+----------+---------+ - | 10030 | hackernews_1m | 2023-02-10 19:44:12.929 | 2023-02-10 19:44:13.938 | hackernews_1m | 10031 | 10008 | 1:1994690496 | 3 | FINISHED | | NULL | 2592000 | - | 10053 | hackernews_1m | 2023-02-10 19:49:32.893 | 2023-02-10 19:49:33.982 | hackernews_1m | 10054 | 10008 | 1:378856428 | 4 | FINISHED | | NULL | 2592000 | - | 10076 | hackernews_1m | 2023-02-10 19:54:20.046 | 2023-02-10 19:54:21.521 | hackernews_1m | 10077 | 10008 | 1:1335127701 | 5 | FINISHED | | NULL | 2592000 | - +-------+---------------+-------------------------+-------------------------+---------------+---------+---------------+---------------+---------------+----------+------+----------+---------+ - ``` +Creating an incremental index for 1 million author records took only 1.5 seconds. - ```sql - mysql> SHOW BUILD INDEX ORDER BY CreateTime DESC LIMIT 1; - +-------+---------------+---------------+----------------------------------------------------+-------------------------+-------------------------+---------------+----------+------+----------+ - | JobId | TableName | PartitionName | AlterInvertedIndexes | CreateTime | FinishTime | TransactionId | State | Msg | Progress | - +-------+---------------+---------------+----------------------------------------------------+-------------------------+-------------------------+---------------+----------+------+----------+ - | 13006 | hackernews_1m | hackernews_1m | [ADD INDEX idx_author (`author`) USING INVERTED], | 2023-06-26 17:23:02.610 | 2023-06-26 17:23:03.755 | 3004 | FINISHED | | NULL | - +-------+---------------+---------------+----------------------------------------------------+-------------------------+-------------------------+---------------+----------+------+----------+ - 1 row in set (0.01 sec) - ``` +```sql +mysql> SHOW ALTER TABLE COLUMN; ++-------+---------------+-------------------------+-------------------------+---------------+---------+---------------+---------------+---------------+----------+------+----------+---------+ +| JobId | TableName | CreateTime | FinishTime | IndexName | IndexId | OriginIndexId | SchemaVersion | TransactionId | State | Msg | Progress | Timeout | ++-------+---------------+-------------------------+-------------------------+---------------+---------+---------------+---------------+---------------+----------+------+----------+---------+ +| 10030 | hackernews_1m | 2023-02-10 19:44:12.929 | 2023-02-10 19:44:13.938 | hackernews_1m | 10031 | 10008 | 1:1994690496 | 3 | FINISHED | | NULL | 2592000 | +| 10053 | hackernews_1m | 2023-02-10 19:49:32.893 | 2023-02-10 19:49:33.982 | hackernews_1m | 10054 | 10008 | 1:378856428 | 4 | FINISHED | | NULL | 2592000 | +| 10076 | hackernews_1m | 2023-02-10 19:54:20.046 | 2023-02-10 19:54:21.521 | hackernews_1m | 10077 | 10008 | 1:1335127701 | 5 | FINISHED | | NULL | 2592000 | ++-------+---------------+-------------------------+-------------------------+---------------+---------+---------------+---------------+---------------+----------+------+----------+---------+ +``` -- After creating the index, string equality matches also showed significant acceleration. +```sql +mysql> SHOW BUILD INDEX ORDER BY CreateTime DESC LIMIT 1; ++-------+---------------+---------------+----------------------------------------------------+-------------------------+-------------------------+---------------+----------+------+----------+ +| JobId | TableName | PartitionName | AlterInvertedIndexes | CreateTime | FinishTime | TransactionId | State | Msg | Progress | ++-------+---------------+---------------+----------------------------------------------------+-------------------------+-------------------------+---------------+----------+------+----------+ +| 13006 | hackernews_1m | hackernews_1m | [ADD INDEX idx_author (`author`) USING INVERTED], | 2023-06-26 17:23:02.610 | 2023-06-26 17:23:03.755 | 3004 | FINISHED | | NULL | ++-------+---------------+---------------+----------------------------------------------------+-------------------------+-------------------------+---------------+----------+------+----------+ +1 row in set (0.01 sec) +``` - ```sql - mysql> SELECT count() FROM hackernews_1m WHERE author = 'faster'; - +---------+ - | count() | - +---------+ - | 20 | - +---------+ - 1 row in set (0.01 sec) - ``` +-- After creating the index, string equality matches also showed significant acceleration. + +```sql +mysql> SELECT count() FROM hackernews_1m WHERE author = 'faster'; ++---------+ +| count() | ++---------+ +| 20 | ++---------+ +1 row in set (0.01 sec) +``` diff --git a/versioned_docs/version-2.1/table-design/index/ngram-bloomfilter-index.md b/versioned_docs/version-2.1/table-design/index/ngram-bloomfilter-index.md index 71368ead8c0..be1a3a098c1 100644 --- a/versioned_docs/version-2.1/table-design/index/ngram-bloomfilter-index.md +++ b/versioned_docs/version-2.1/table-design/index/ngram-bloomfilter-index.md @@ -27,7 +27,7 @@ under the License. ## Indexing Principles -The NGram BloomFilter index, similar to the BloomFilter index, is a skip list index based on BloomFilter. +The NGram BloomFilter index, similar to the BloomFilter index, is a skip index based on BloomFilter. Unlike the BloomFilter index, the NGram BloomFilter index is used to accelerate text LIKE queries. Instead of storing the original text values, it tokenizes the text using NGram and stores each token in the BloomFilter. For LIKE queries, the pattern in LIKE '%pattern%' is also tokenized using NGram. Each token is checked against the BloomFilter, and if any token is not found, the corresponding data block does not meet the LIKE condition and can be skipped, reducing IO and accelerating th [...] @@ -43,7 +43,7 @@ The NGram BloomFilter index can only accelerate string LIKE queries, and the num ::: -## Syntax +## Managing Indexes ### Creating an NGram BloomFilter Index @@ -67,8 +67,14 @@ Explanation of the syntax: ### Viewing NGram BloomFilter Index +-- Syntax 1: The INDEX section in the table schema with USING NGRAM_BF indicates an inverted index ```sql -SHOW CREATE TABLE table_ngrambf; +SHOW CREATE TABLE table_name; +``` + +-- Syntax 2: IndexType as NGRAM_BF indicates an inverted index +```sql +SHOW INDEX FROM idx_name; ``` ### Deleting an NGram BloomFilter Index @@ -85,6 +91,15 @@ CREATE INDEX idx_column_name2(column_name2) ON table_ngrambf USING NGRAM_BF PROP ALTER TABLE table_ngrambf ADD INDEX idx_column_name2(column_name2) USING NGRAM_BF PROPERTIES("gram_size"="3", "bf_size"="1024") COMMENT 'username ngram_bf index'; ``` +## Using Indexes + +NGram BloomFilter index is used to accelerate LIKE queries, for example: +SELECT count() FROM table1 WHERE message LIKE '%error%'; + +The acceleration effect of the BloomFilter index (including NGram) can be analyzed using the following metrics in the Query Profile: +- RowsBloomFilterFiltered: The number of rows filtered by the BloomFilter index, which can be compared with other Rows values to analyze the filtering effect of the index. +- BlockConditionsFilteredBloomFilterTime: The time consumed by the BloomFilter inverted index. + ## Usage Example This section demonstrates the usage and effectiveness of the NGram BloomFilter index using a dataset of Amazon product reviews, `amazon_reviews`. diff --git a/versioned_docs/version-2.1/table-design/index/prefix-index.md b/versioned_docs/version-2.1/table-design/index/prefix-index.md index 7801c9fce61..74d2e26ffb4 100644 --- a/versioned_docs/version-2.1/table-design/index/prefix-index.md +++ b/versioned_docs/version-2.1/table-design/index/prefix-index.md @@ -42,17 +42,30 @@ The first 36 bytes of a row in a data block are used as the prefix index for tha Prefix indexes can speed up equality queries and range queries. +## Managing Indexes + +There is no specific syntax to define a prefix index. When creating a table, the first 36 bytes of the table's Key are automatically taken as the prefix index. + +### Recommendations for Prefix Index Selection + :::tip -Since the KEY definition of a table is unique, a table can only have one type of prefix index. For queries using other columns that cannot hit the prefix index as conditions, the efficiency might not meet the requirements. There are two solutions: +Because the Key definition of a table is unique, a table has only one set of prefix indexes. Therefore, it is important to choose an appropriate prefix index when designing the table structure. The following recommendations can be considered: +1. Choose the fields most commonly used in WHERE filtering conditions as the Key. +2. Place the more frequently used fields at the front, as prefix indexes are only effective for fields in the WHERE condition that are part of the Key's prefix. + +For queries that use other columns not covered by the prefix index as conditions, the efficiency may not meet the requirements. There are two solutions: 1. Create an inverted index on the columns that require accelerated queries, as a table can have many inverted indexes. 2. For DUPLICATE tables, multi-prefix indexes can be indirectly achieved by creating corresponding strongly consistent materialized views with adjusted column orders. For more details, refer to query acceleration/materialized views. ::: -## Syntax +## Using Indexes + +Prefix indexes are used to accelerate equality and range queries in the WHERE clause. They automatically take effect when applicable, and there is no special syntax required. -There is no specific syntax for defining a prefix index. When creating a table, the first 36 bytes of the table's KEY are automatically taken as the prefix index. +The acceleration effect of the prefix index can be analyzed using the following metrics in the Query Profile: +- RowsKeyRangeFiltered: The number of rows filtered by the prefix index, which can be compared with other Rows values to analyze the filtering effect of the index. ## Example Usage diff --git a/versioned_docs/version-3.0/table-design/index/bloomfilter.md b/versioned_docs/version-3.0/table-design/index/bloomfilter.md index bbf569a6cd2..49094e3b853 100644 --- a/versioned_docs/version-3.0/table-design/index/bloomfilter.md +++ b/versioned_docs/version-3.0/table-design/index/bloomfilter.md @@ -27,7 +27,7 @@ under the License. ## Indexing Principles -The BloomFilter index is a type of skip list index based on BloomFilter. Its principle is to use BloomFilter to skip data blocks that do not meet the specified conditions for equality queries, thereby reducing IO and accelerating queries. +The BloomFilter index is a type of skip index based on BloomFilter. Its principle is to use BloomFilter to skip data blocks that do not meet the specified conditions for equality queries, thereby reducing IO and accelerating queries. BloomFilter is a fast lookup algorithm proposed by Bloom in 1970, which uses multiple hash functions. It is commonly used in scenarios where quick determination of whether an element belongs to a set is needed without requiring 100% accuracy. BloomFilter has the following characteristics: @@ -63,7 +63,7 @@ To check the effect of a BloomFilter index on a query, you can analyze relevant ::: -## Syntax +## Managing Indexes ### Creating a BloomFilter Index When Creating a Table @@ -97,12 +97,18 @@ ALTER TABLE table_name SET ("bloom_filter_columns" = "column_name1,column_name2, ALTER TABLE table_name SET ("bloom_filter_columns" = "column_name2,column_name3"); ``` +## Using Indexes + +BloomFilter index is used to accelerate equality queries in the WHERE clause. It automatically takes effect when applicable, and there is no special syntax required. + +The acceleration effect of the BloomFilter index can be analyzed using the following metrics in the Query Profile: +- RowsBloomFilterFiltered: The number of rows filtered by the BloomFilter index, which can be compared with other Rows values to analyze the filtering effect of the index. +- BlockConditionsFilteredBloomFilterTime: The time consumed by the BloomFilter inverted index. + ## Usage Example Here is an example of how to create a BloomFilter index in Doris. -### Creating a BloomFilter Index - The BloomFilter index in Doris is created by adding the "bloom_filter_columns" property in the CREATE TABLE statement, with k1, k2, k3 being the key columns for the BloomFilter index. For example, the following creates BloomFilter indexes on saler_id and category_id. ```sql diff --git a/versioned_docs/version-3.0/table-design/index/inverted-index.md b/versioned_docs/version-3.0/table-design/index/inverted-index.md index 25e63fe8b65..4b5adfa771c 100644 --- a/versioned_docs/version-3.0/table-design/index/inverted-index.md +++ b/versioned_docs/version-3.0/table-design/index/inverted-index.md @@ -83,18 +83,11 @@ There are some limitations to using inverted indexes: 3. DUPLICATE and UNIQUE table models with Merge-on-Write enabled support building inverted indexes on any column. However, AGGREGATE and UNIQUE models without Merge-on-Write enabled only support building inverted indexes on Key columns, as non-Key columns cannot have inverted indexes. This is because these two models require reading all data for merging, so indexes cannot be used for pre-filtering. -To see the effect of inverted indexes on a query, you can analyze relevant metrics in the Query Profile. - -- InvertedIndexFilterTime: time consumed by the inverted index - - InvertedIndexSearcherOpenTime: time to open the inverted index - - InvertedIndexSearcherSearchTime: time for internal queries of the inverted index - -- RowsInvertedIndexFiltered: number of rows filtered by the inverted index, can be compared with other Rows values to analyze the filtering effect of the BloomFilter index ::: -## Syntax +## Managing Indexes -### Define Inverted Indexes When Creating a Table +### Defining Inverted Indexes When Creating a Table In the table creation statement, after the COLUMN definition, is the index definition: @@ -265,6 +258,16 @@ ALTER TABLE table_name DROP INDEX idx_name; ::: +### Viewing Inverted Index + +-- Syntax 1: The INDEX section in the table schema with USING INVERTED indicates an inverted index +SHOW CREATE TABLE table_name; + +-- Syntax 2: IndexType as INVERTED indicates an inverted index +SHOW INDEX FROM idx_name; + +## Using Indexes + ### Accelerating Queries with Inverted Indexes ```sql @@ -320,27 +323,31 @@ SELECT * FROM table_name WHERE ts > '2023-01-01 00:00:00'; SELECT * FROM table_name WHERE op_type IN ('add', 'delete'); ``` -### TOKENIZE Function +### Analyzing Index Acceleration Effects Through Profiles + +Inverted query acceleration can be toggled using the session variable `enable_inverted_index_query`, which is set to true by default. To verify the acceleration effect of the index, it can be set to false to turn it off. + +The acceleration effect of the inverted index can be analyzed using the following metrics in the Query Profile: +- RowsInvertedIndexFiltered: The number of rows filtered by the inverted index, which can be compared with other Rows values to analyze the filtering effect of the index. +- InvertedIndexFilterTime: The time consumed by the inverted index. + - InvertedIndexSearcherOpenTime: The time taken to open the inverted index. + - InvertedIndexSearcherSearchTime: The time taken for internal queries within the inverted index. + + +### Verifying Tokenization Effects Using Tokenization Functions To check the actual effect of tokenization or to tokenize a piece of text, you can use the `TOKENIZE` function for verification. The first parameter of the `TOKENIZE` function is the text to be tokenized, and the second parameter specifies the tokenization parameters used when creating the index. -mysql> SELECT TOKENIZE('I love CHINA','"parser"="english"'); +mysql> SELECT TOKENIZE('I love Doris','"parser"="english"'); +------------------------------------------------+ -| tokenize('I love CHINA', '"parser"="english"') | +| tokenize('I love Doris', '"parser"="english"') | +------------------------------------------------+ -| ["i", "love", "china"] | +| ["i", "love", "doris"] | +------------------------------------------------+ 1 row in set (0.02 sec) -mysql> SELECT TOKENIZE('I love CHINA 我爱我的祖国','"parser"="unicode"'); -+-------------------------------------------------------------------+ -| tokenize('I love CHINA 我爱我的祖国', '"parser"="unicode"') | -+-------------------------------------------------------------------+ -| ["i", "love", "china", "我", "爱", "我", "的", "祖", "国"] | -+-------------------------------------------------------------------+ -1 row in set (0.02 sec) ``` ## Usage Example @@ -387,7 +394,7 @@ PROPERTIES ("replication_num" = "1"); **Importing Data via Stream Load** ``` -wget https://doris-build-1308700295.cos.ap-beijing.myqcloud.com/regression/index/hacknernews_1m.csv.gz +wget https://qa-build.oss-cn-beijing.aliyuncs.com/regression/index/hacknernews_1m.csv.gz curl --location-trusted -u root: -H "compress_type:gz" -T hacknernews_1m.csv.gz http://127.0.0.1:8030/api/test_inverted_index/hackernews_1m/_stream_load { diff --git a/versioned_docs/version-3.0/table-design/index/ngram-bloomfilter-index.md b/versioned_docs/version-3.0/table-design/index/ngram-bloomfilter-index.md index 71368ead8c0..be1a3a098c1 100644 --- a/versioned_docs/version-3.0/table-design/index/ngram-bloomfilter-index.md +++ b/versioned_docs/version-3.0/table-design/index/ngram-bloomfilter-index.md @@ -27,7 +27,7 @@ under the License. ## Indexing Principles -The NGram BloomFilter index, similar to the BloomFilter index, is a skip list index based on BloomFilter. +The NGram BloomFilter index, similar to the BloomFilter index, is a skip index based on BloomFilter. Unlike the BloomFilter index, the NGram BloomFilter index is used to accelerate text LIKE queries. Instead of storing the original text values, it tokenizes the text using NGram and stores each token in the BloomFilter. For LIKE queries, the pattern in LIKE '%pattern%' is also tokenized using NGram. Each token is checked against the BloomFilter, and if any token is not found, the corresponding data block does not meet the LIKE condition and can be skipped, reducing IO and accelerating th [...] @@ -43,7 +43,7 @@ The NGram BloomFilter index can only accelerate string LIKE queries, and the num ::: -## Syntax +## Managing Indexes ### Creating an NGram BloomFilter Index @@ -67,8 +67,14 @@ Explanation of the syntax: ### Viewing NGram BloomFilter Index +-- Syntax 1: The INDEX section in the table schema with USING NGRAM_BF indicates an inverted index ```sql -SHOW CREATE TABLE table_ngrambf; +SHOW CREATE TABLE table_name; +``` + +-- Syntax 2: IndexType as NGRAM_BF indicates an inverted index +```sql +SHOW INDEX FROM idx_name; ``` ### Deleting an NGram BloomFilter Index @@ -85,6 +91,15 @@ CREATE INDEX idx_column_name2(column_name2) ON table_ngrambf USING NGRAM_BF PROP ALTER TABLE table_ngrambf ADD INDEX idx_column_name2(column_name2) USING NGRAM_BF PROPERTIES("gram_size"="3", "bf_size"="1024") COMMENT 'username ngram_bf index'; ``` +## Using Indexes + +NGram BloomFilter index is used to accelerate LIKE queries, for example: +SELECT count() FROM table1 WHERE message LIKE '%error%'; + +The acceleration effect of the BloomFilter index (including NGram) can be analyzed using the following metrics in the Query Profile: +- RowsBloomFilterFiltered: The number of rows filtered by the BloomFilter index, which can be compared with other Rows values to analyze the filtering effect of the index. +- BlockConditionsFilteredBloomFilterTime: The time consumed by the BloomFilter inverted index. + ## Usage Example This section demonstrates the usage and effectiveness of the NGram BloomFilter index using a dataset of Amazon product reviews, `amazon_reviews`. diff --git a/versioned_docs/version-3.0/table-design/index/prefix-index.md b/versioned_docs/version-3.0/table-design/index/prefix-index.md index 7801c9fce61..74d2e26ffb4 100644 --- a/versioned_docs/version-3.0/table-design/index/prefix-index.md +++ b/versioned_docs/version-3.0/table-design/index/prefix-index.md @@ -42,17 +42,30 @@ The first 36 bytes of a row in a data block are used as the prefix index for tha Prefix indexes can speed up equality queries and range queries. +## Managing Indexes + +There is no specific syntax to define a prefix index. When creating a table, the first 36 bytes of the table's Key are automatically taken as the prefix index. + +### Recommendations for Prefix Index Selection + :::tip -Since the KEY definition of a table is unique, a table can only have one type of prefix index. For queries using other columns that cannot hit the prefix index as conditions, the efficiency might not meet the requirements. There are two solutions: +Because the Key definition of a table is unique, a table has only one set of prefix indexes. Therefore, it is important to choose an appropriate prefix index when designing the table structure. The following recommendations can be considered: +1. Choose the fields most commonly used in WHERE filtering conditions as the Key. +2. Place the more frequently used fields at the front, as prefix indexes are only effective for fields in the WHERE condition that are part of the Key's prefix. + +For queries that use other columns not covered by the prefix index as conditions, the efficiency may not meet the requirements. There are two solutions: 1. Create an inverted index on the columns that require accelerated queries, as a table can have many inverted indexes. 2. For DUPLICATE tables, multi-prefix indexes can be indirectly achieved by creating corresponding strongly consistent materialized views with adjusted column orders. For more details, refer to query acceleration/materialized views. ::: -## Syntax +## Using Indexes + +Prefix indexes are used to accelerate equality and range queries in the WHERE clause. They automatically take effect when applicable, and there is no special syntax required. -There is no specific syntax for defining a prefix index. When creating a table, the first 36 bytes of the table's KEY are automatically taken as the prefix index. +The acceleration effect of the prefix index can be analyzed using the following metrics in the Query Profile: +- RowsKeyRangeFiltered: The number of rows filtered by the prefix index, which can be compared with other Rows values to analyze the filtering effect of the index. ## Example Usage --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org