korlov42 commented on code in PR #6163: URL: https://github.com/apache/ignite-3/pull/6163#discussion_r2182001191
########## docs/_docs/sql-reference/explain-operators-list.adoc: ########## @@ -0,0 +1,505 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. += List Of Operators + +This section enumerates all operators with their semantic and supported attributes. + +== ColocatedHashAggregate + +The aggregate operation groups input data on one or more sets of grouping keys, calculating each measure for each combination of grouping key. +Colocated aggregate assumes that the data is already distributed according to grouping keys, therefore aggregation can be completed locally in a single pass. +The hash aggregate operation maintains a hash table for each grouping set to coalesce equivalent tuples. +The output rows are composed as follow: first come columns participated in grouping keys in the order they enumerated in `group` attribute, then come results of accumulators in the order they enumerated in `aggregation` attribute. + +Attributes: + +- `group`: Set of grouping columns. +- `groupSets`: List of group key definitions for advanced grouping, like CUBE or ROLLUP. +Optional. +- `aggregation`: List of accumulators. +- `fieldNames`: List of names of columns in produced rows. +Optional. +- `est`: Estimated number of output rows. + +== ColocatedSortAggregate + +The aggregate operation groups input data on one or more sets of grouping keys, calculating each measure for each combination of grouping key. +Colocated aggregate assumes that the data is already distributed according to grouping keys, therefore aggregation can be completed locally in a single pass. +The sort aggregate operation leverages data ordered by the grouping expressions to calculate data each grouping set tuple-by-tuple in streaming fashion. +The output rows are composed as follow: first come columns participated in grouping keys in the order they enumerated in `group` attribute, then come results of accumulators in the order they enumerated in `aggregation` attribute. + +Attributes: + +- `group`: Set of grouping columns. +- `groupSets`: List of group key definitions for advanced grouping, like CUBE or ROLLUP. +Optional. +- `aggregation`: List of accumulators. +- `collation`: List of columns and expected order of sorting this operator is rely on. +- `fieldNames`: List of names of columns in produced rows. +Optional. +- `est`: Estimated number of output rows. + +== MapHashAggregate + +The aggregate operation groups input data on one or more sets of grouping keys, calculating each measure for each combination of grouping key. +Map aggregate is a first phase of 2-phase aggregation. +During first phase, data is pre-aggregated, and result is sent to the where REDUCE is executed. +The hash aggregate operation maintains a hash table for each grouping set to coalesce equivalent tuples. +The output rows are composed as follow: first come columns participated in grouping keys in the order they enumerated in `group` attribute, then come results of accumulators in the order they enumerated in `aggregation` attribute. + +Attributes: + +- `group`: Set of grouping columns. +- `groupSets`: List of group key definitions for advanced grouping, like CUBE or ROLLUP. +Optional. +- `aggregation`: List of accumulators. +- `fieldNames`: List of names of columns in produced rows. +Optional. +- `est`: Estimated number of output rows. + +== ReduceHashAggregate + +The aggregate operation groups input data on one or more sets of grouping keys, calculating each measure for each combination of grouping key. +Reduce aggregate is a second phase of 2-phase aggregation. +During second phase, all pre-aggregated data is merged together, and final result is returned. +The hash aggregate operation maintains a hash table for each grouping set to coalesce equivalent tuples. +The output rows are composed as follow: first come columns participated in grouping keys in the order they enumerated in `group` attribute, then come results of accumulators in the order they enumerated in `aggregation` attribute. + +Attributes: + +- `group`: Set of grouping columns. +- `groupSets`: List of group key definitions for advanced grouping, like CUBE or ROLLUP. +Optional. +- `aggregation`: List of accumulators. +- `fieldNames`: List of names of columns in produced rows. +Optional. +- `est`: Estimated number of output rows. + +== MapSortAggregate + +The aggregate operation groups input data on one or more sets of grouping keys, calculating each measure for each combination of grouping key. +Map aggregate is a first phase of 2-phase aggregation. +During first phase, data is pre-aggregated, and result is sent to the where REDUCE is executed. +The sort aggregate operation leverages data ordered by the grouping expressions to calculate data each grouping set tuple-by-tuple in streaming fashion. +The output rows are composed as follow: first come columns participated in grouping keys in the order they enumerated in `group` attribute, then come results of accumulators in the order they enumerated in `aggregation` attribute. + +Attributes: + +- `group`: Set of grouping columns. +- `groupSets`: List of group key definitions for advanced grouping, like CUBE or ROLLUP. +Optional. +- `aggregation`: List of accumulators. +- `collation`: List of columns and expected order of sorting this operator is rely on. +- `fieldNames`: List of names of columns in produced rows. +Optional. +- `est`: Estimated number of output rows. + +== ReduceSortAggregate + +The aggregate operation groups input data on one or more sets of grouping keys, calculating each measure for each combination of grouping key. +Reduce aggregate is a second phase of 2-phase aggregation. +During second phase, all pre-aggregated data is merged together, and final result is returned. +The sort aggregate operation leverages data ordered by the grouping expressions to calculate data each grouping set tuple-by-tuple in streaming fashion. +The output rows are composed as follow: first come columns participated in grouping keys in the order they enumerated in `group` attribute, then come results of accumulators in the order they enumerated in `aggregation` attribute. + +Attributes: + +- `group`: Set of grouping columns. +- `groupSets`: List of group key definitions for advanced grouping, like CUBE or ROLLUP. +Optional. +- `aggregation`: List of accumulators. +- `collation`: List of columns and expected order of sorting this operator is rely on. +- `fieldNames`: List of names of columns in produced rows. +Optional. +- `est`: Estimated number of output rows. + +== ColocatedIntersect + +Returns all records from the primary input that are present in every secondary input. +If `all` is `true`, then for each specific record returned, the output contains min(m, n1, n2, …, n) copies. +Otherwise duplicates are eliminated. + +Attributes: + +- `all`: If `true`, then output may contains duplicates. +- `fieldNames`: List of names of columns in produced rows. +Optional. +- `est`: Estimated number of output rows. + +== ColocatedMinus + +Returns all records from the primary input excluding any matching records from secondary inputs. +If `all` is `true`, then for each specific record returned, the output contains max(0, m - sum(n1, n2, …, n)) copies. +Otherwise duplicates are eliminated. + +Attributes: + +- `all`: If `true`, then output may contains duplicates. +- `fieldNames`: List of names of columns in produced rows. +Optional. +- `est`: Estimated number of output rows. + +== MapIntersect + +Returns all records from the primary input that are present in every secondary input. +Map intersect is a first phase of 2-phase computation. +During first phase, data is pre-aggregated, and result is sent to the where REDUCE is executed. + +Attributes: + +- `all`: If `true`, then output may contains duplicates. +- `fieldNames`: List of names of columns in produced rows. +Optional. +- `est`: Estimated number of output rows. + +== ReduceIntersect + +Returns all records from the primary input that are present in every secondary input. +Reduce intersect is a second phase of 2-phase computation. +During second phase, all pre-aggregated data is merged together, and final result is returned. +If `all` is `true`, then for each specific record returned, the output contains min(m, n1, n2, …, n) copies. +Otherwise duplicates are eliminated. + +Attributes: + +- `all`: If `true`, then output may contains duplicates. +- `fieldNames`: List of names of columns in produced rows. +Optional. +- `est`: Estimated number of output rows. + +== MapMinus + +Returns all records from the primary input excluding any matching records from secondary inputs. +Map minus is a first phase of 2-phase computation. +During first phase, data is pre-aggregated, and result is sent to the where REDUCE is executed. + +Attributes: + +- `all`: If `true`, then output may contains duplicates. +- `fieldNames`: List of names of columns in produced rows. +Optional. +- `est`: Estimated number of output rows. + +== ReduceMinus + +Returns all records from the primary input excluding any matching records from secondary inputs. +Reduce minus is a second phase of 2-phase computation. +During second phase, all pre-aggregated data is merged together, and final result is returned. +If `all` is `true`, then for each specific record returned, the output contains max(0, m - sum(n1, n2, …, n)) copies. +Otherwise duplicates are eliminated. + +Attributes: + +- `all`: If `true`, then output may contains duplicates. +- `fieldNames`: List of names of columns in produced rows. +Optional. +- `est`: Estimated number of output rows. + +== UnionAll + +Concatenates results from multiple inputs without removing duplicates. + +Attributes: + +- `fieldNames`: List of names of columns in produced rows. +Optional. +- `est`: Estimated number of output rows. + +== Exchange + +Redistribute rows according to specified distribution. + +Attributes: + +- `distribution`: A distribution strategy that describes how the rows are distributed across nodes. +Possible values are: +`single` (a single copy of data is available at single node), +`broadcast` (every participating node has the its own copy of all the data), +`random` (single copy of data is partitioned and spread randomly across all participating nodes), +`hash` (single copy of data is partitioned and spread across nodes based on hash function of specified columns), +`table` (single copy of data is partitioned and spread across nodes with regard of distribution of specified table), +`identity` (data is distributed with regard to value of specified column). +- `est`: Estimated number of output rows. + +== TrimExchange + +Filters rows according to specified distribution. +This operator accept input that is broadcasted, i.e. every participating node has the its own copy of all the data, and applies a predicate such that output rows satisfy specified distribution. + +Attributes: + +- `distribution`: A distribution strategy that describes how the rows are distributed across nodes. +Possible values are: +`random` (single copy of data is partitioned and spread randomly across all participating nodes), +`hash` (single copy of data is partitioned and spread across nodes based on hash function of specified columns), +`table` (single copy of data is partitioned and spread across nodes with regard of distribution of specified table). +- `est`: Estimated number of output rows. + +== Filter + +Filters rows according to specified predicate conditions. + +Attributes: + +- `predicate`: Filtering condition. +- `est`: Estimated number of output rows. + +== HashJoin + +The join operation will combine two separate inputs into a single output, based on a join expression. +The hash join operator will build a hash table out of right input based on a set of join keys. +It will then probe that hash table for the left input, finding matches. + +Attributes: + +- `predicate`: A boolean condition that describes whether each row from the left set “match” the row from the right set. +- `type`: Type of the join (like INNER, LEFT, SEMI, etc). +- `fieldNames`: List of names of columns in produced rows. +Optional. +- `est`: Estimated number of output rows. + +== MergeJoin + +The join operation will combine two separate inputs into a single output, based on a join expression. +TThe merge join does a join by taking advantage of two sets that are sorted on the join keys. +This allows the join operation to be done in a streaming fashion. + +Attributes: + +- `predicate`: A boolean condition that describes whether each row from the left set “match” the row from the right set. +- `type`: Type of the join (like INNER, LEFT, SEMI, etc). +- `fieldNames`: List of names of columns in produced rows. +Optional. +- `est`: Estimated number of output rows. + +== NestedLoopJoin + +The join operation will combine two separate inputs into a single output, based on a join expression. +The nested loop join operator does a join by holding the entire right input and then iterating over it using the left input, evaluating the join expression on the Cartesian product of all rows, only outputting rows where the expression is true. + +Attributes: + +- `predicate`: A boolean condition that describes whether each row from the left set “match” the row from the right set. +- `type`: Type of the join (like INNER, LEFT, SEMI, etc). +- `fieldNames`: List of names of columns in produced rows. +Optional. +- `est`: Estimated number of output rows. + +== CorrelatedNestedLoopJoin + +The join operation will combine two separate inputs into a single output, based on a join expression. +The correlated nested loop join operator does a join by setting correlated variables to a context based on a row from left input, and reevaluating the right input with updated context. + +Attributes: + +- `correlates`: Set of correlated variables which are set by current relational operator. +- `predicate`: A boolean condition that describes whether each row from the left set “match” the row from the right set. +- `type`: Type of the join (like INNER, LEFT, SEMI, etc). +- `fieldNames`: List of names of columns in produced rows. +Optional. +- `est`: Estimated number of output rows. + +== IndexScan + +Scans rows using a specified index. +A `searchBounds` is used to specify boundaries of index scan or look up. +Hence, if it is not specified, all rows will be read. +A `predicate` is applied before `projection`. +If `projection` is not specified, then `fieldNames` enumerates columns returned from table. + +Attributes: + +- `table`: Table being accessed. +- `searchBounds`: List of bounds representing boundaries of the range scan or point look up Optional. +- `predicate`: Filtering condition. +Optional. +- `projection`: List of expressions to evaluate. +Optional. +- `fieldNames`: List of names of columns in produced rows. +Optional. +- `est`: Estimated number of output rows. + +== TableScan + +Scans all rows from a table. +A `predicate` is applied before `projection`. +If `projection` is not specified, then `fieldNames` enumerates columns returned from table. + +Attributes: + +- `table`: Table being accessed. +- `predicate`: Filtering condition. +Optional. +- `projection`: List of expressions to evaluate. +Optional. +- `fieldNames`: List of names of columns in produced rows. +Optional. +- `est`: Estimated number of output rows. + +== KeyValueGet + +Optimized operator which leverages Key-Value API in get-by-key queries. + +Attributes: + +- `table`: Table being accessed. +- `key`: Key expression to do look up. +- `predicate`: Filtering condition. +Optional. +- `projection`: List of expressions to evaluate. +Optional. +- `fieldNames`: List of names of columns in produced rows. +Optional. +- `est`: Estimated number of output rows. + +== KeyValueModify + +Optimized operator which leverages Key-Value API in DML queries. + +Attributes: + +- `table`: Table being accessed. +- `sourceExpression`: Source expressions used for row computations. +- `type`: Type of data modification operation (e.g., INSERT, UPDATE, DELETE). +- `fieldNames`: List of names of columns in produced rows. +Optional. +- `est`: Estimated number of output rows. + +== Limit + +Limits the number of returned rows, with optional offset. + +Attributes: + +- `fetch`: Maximum number of rows to return. +Optional. +- `offset`: Number of rows to skip. +Optional. +- `est`: Estimated number of output rows. + +== Project + +Projects specified expressions or columns from the input. + +.Attributes: Review Comment: you're right, fixed ########## docs/_docs/sql-reference/explain-statement.adoc: ########## @@ -0,0 +1,257 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. += EXPLAIN Command + +The `EXPLAIN` command is used to display the execution plan of a SQL query, showing how the query will be processed by the sql engine. +It provides insights into the relational operators used, their configuration, and the estimated number of rows processed at each step. +This information is essential for diagnosing performance bottlenecks and understanding query optimization decisions. + +== Syntax + +[.diagram-container] +Diagram( + Terminal('EXPLAIN'), + Optional( + Sequence( + Choice( + 0, + Terminal('PLAN'), + Terminal('MAPPING') + ), + Terminal('FOR') + ) + ), + NonTerminal('query_or_dml') +) + +If neither `PLAN` nor `MAPPING` is specified, then `PLAN` is implicit. + +Parameters: + +- `PLAN` - explains query in terms of relational operators tree. +This representation is suitable for investigation of performance issues related to the optimizer. + +- `MAPPING` - explains query in terms of mapping of query fragment to a particular node of the cluster. +This representation is suitable for investigation of performance issues related to the data colocation. + +Examples: + +[source,sql] +---- +EXPLAIN SELECT * FROM lineitem; +EXPLAIN PLAN FOR SELECT * FROM lineitem; +EXPLAIN MAPPING FOR SELECT * FROM lineitem; +---- + +== Understanding The Output + +Each query plan is represented as a tree-like structure composed of **relational operators**. + +A node in the plan includes: + +- A **name**, indicating the relational operator (e.g., `TableScan`, `IndexScan`, `Sort`, `Join` types) +- A set of **attributes**, relevant to that specific operator + +[source,text] +---- +OperatorName + attribute1: value1 + attribute2: value2 +---- + +Examples: + +[source,text] +---- +TableScan // Full table access + table: PUBLIC.EMP + fieldNames: [NAME, SALARY] + est: (rows=1) + +IndexScan // Index-based access + table: PUBLIC.EMP + index: EMP_NAME_DESC_IDX + type: SORTED + fields: [NAME] + collation: [NAME DESC] + est: (rows=1) + +Sort // Sort rows + collation: [NAME DESC NULLS LAST] + est: (rows=1) +---- + +=== Operator Naming + +The operator name reflects the specific algorithm or strategy used. +For example: + +- `TableScan` – Full scan of a base table +- `IndexScan` – Access via index, possibly sorted +- `Sort` – Explicit sorting step +- `HashJoin`, `MergeJoin`, `NestedLoopJoin` – Types of join algorithms +- `Limit`, `Project`, `Exchange` – Execution-related transformations and controls + +=== Hierarchical Plan Structure + +The plan is structured as a **tree**, where: + +- **Leaf nodes** represent data sources (e.g., `TableScan`) +- **Internal nodes** represent data transformations (e.g., `Join`, `Sort`) +- **The root node** (topmost) is the final operator that produces the result + +=== Example: Complex Join + +[source,sql] +---- +EXPLAIN SELECT + U.UserName, P.ProductName, R.ReviewText, R.Rating + FROM Users U, Reviews R, Products P + WHERE U.UserID = R.UserID + AND R.ProductID = P.ProductID + AND P.ProductName = 'Product_' || ?::varchar +---- + +The resulting output is: + +[example] +---- +Project + fieldNames: [USERNAME, PRODUCTNAME, REVIEWTEXT, RATING] + projection: [USERNAME, PRODUCTNAME, REVIEWTEXT, RATING] + est: (rows=16650) + HashJoin + predicate: =(USERID$0, USERID) + fieldNames: [PRODUCTID, USERID, REVIEWTEXT, RATING, PRODUCTID$0, PRODUCTNAME, USERID$0, USERNAME] + type: inner + est: (rows=16650) + HashJoin + predicate: =(PRODUCTID, PRODUCTID$0) + fieldNames: [PRODUCTID, USERID, REVIEWTEXT, RATING, PRODUCTID$0, PRODUCTNAME] + type: inner + est: (rows=16650) + Exchange + distribution: single + est: (rows=50000) + TableScan + table: PUBLIC.REVIEWS + fieldNames: [PRODUCTID, USERID, REVIEWTEXT, RATING] + est: (rows=50000) + Exchange + distribution: single + est: (rows=1665) + TableScan + table: PUBLIC.PRODUCTS + predicate: =(PRODUCTNAME, ||(_UTF-8'Product_', CAST(?0):VARCHAR CHARACTER SET "UTF-8")) + fieldNames: [PRODUCTID, PRODUCTNAME] + est: (rows=1665) + Exchange + distribution: single + est: (rows=10000) + TableScan + table: PUBLIC.USERS + fieldNames: [USERID, USERNAME] + est: (rows=10000) +---- + +=== Query Mapping + +A result of EXPLAIN MAPPING command includes additional metadata providing insight at how the query is mapped on cluster topology: + +[example] +---- +Fragment#0 root + distribution: single + executionNodes: [node_1] + tree: + Project + fieldNames: [USERNAME, PRODUCTNAME, REVIEWTEXT, RATING] + projection: [USERNAME, PRODUCTNAME, REVIEWTEXT, RATING] + est: (rows=1) + HashJoin + predicate: =(USERID$0, USERID) + fieldNames: [PRODUCTID, USERID, REVIEWTEXT, RATING, PRODUCTID$0, PRODUCTNAME, USERID$0, USERNAME] + type: inner + est: (rows=1) + HashJoin + predicate: =(PRODUCTID, PRODUCTID$0) + fieldNames: [PRODUCTID, USERID, REVIEWTEXT, RATING, PRODUCTID$0, PRODUCTNAME] + type: inner + est: (rows=1) + Receiver + fieldNames: [PRODUCTID, USERID, REVIEWTEXT, RATING] + sourceFragmentId: 1 + est: (rows=1) + Receiver + fieldNames: [PRODUCTID, PRODUCTNAME] + sourceFragmentId: 2 + est: (rows=1) + Receiver + fieldNames: [USERID, USERNAME] + sourceFragmentId: 3 + est: (rows=1) + +Fragment#1 + distribution: random + executionNodes: [node_1, node_2, node_3] + partitions: [REVIEWS=[node_1={0, 2, 5, 6, 7, 8, 9, 10, 12, 13, 20}, node_2={1, 3, 11, 19, 21, 22, 23, 24}, node_3={4, 14, 15, 16, 17, 18}]] + tree: + Sender + distribution: single + targetFragmentId: 0 + est: (rows=50000) + TableScan + table: PUBLIC.REVIEWS + fieldNames: [PRODUCTID, USERID, REVIEWTEXT, RATING] + est: (rows=50000) + +Fragment#2 + distribution: table PUBLIC.PRODUCTS in zone "Default" + executionNodes: [node_1, node_2, node_3] + partitions: [PRODUCTS=[node_1={0, 2, 5, 6, 7, 8, 9, 10, 12, 13, 20}, node_2={1, 3, 11, 19, 21, 22, 23, 24}, node_3={4, 14, 15, 16, 17, 18}]] + tree: + Sender + distribution: single + targetFragmentId: 0 + est: (rows=1665) + TableScan + table: PUBLIC.PRODUCTS + predicate: =(PRODUCTNAME, ||(_UTF-8'Product_', CAST(?0):VARCHAR CHARACTER SET "UTF-8")) + fieldNames: [PRODUCTID, PRODUCTNAME] + est: (rows=1665) + +Fragment#3 + distribution: table PUBLIC.USERS in zone "Default" + executionNodes: [node_1, node_2, node_3] + partitions: [USERS=[node_1={0, 2, 5, 6, 7, 8, 9, 10, 12, 13, 20}, node_2={1, 3, 11, 19, 21, 22, 23, 24}, node_3={4, 14, 15, 16, 17, 18}]] + tree: + Sender + distribution: single + targetFragmentId: 0 + est: (rows=10000) + TableScan + table: PUBLIC.USERS + fieldNames: [USERID, USERNAME] + est: (rows=10000) +---- + +where: + +- **Fragment#0** means fragment with id=0 +- A **root** marks a fragment which is considered as root fragment, i.e. a fragment which represents user's cursor +- A **distribution** attribute provides an insight into which mapping strategy was applied to this particular fragment +- A **executionNodes** attribute provides a lis t of nodes this fragment will be executed on Review Comment: fixed, thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@ignite.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org