IgGusev commented on code in PR #6163: URL: https://github.com/apache/ignite-3/pull/6163#discussion_r2191734835
########## docs/_docs/sql-reference/explain-operators-list.adoc: ########## @@ -0,0 +1,505 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. += List Of Operators + +This section enumerates all operators with their semantic and supported attributes. Review Comment: ```suggestion This section enumerates all operators for the EXPLAIN statement with their semantic and supported attributes. ``` ########## docs/_docs/sql-reference/explain-operators-list.adoc: ########## @@ -0,0 +1,505 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. += List Of Operators + +This section enumerates all operators with their semantic and supported attributes. + +== ColocatedHashAggregate + +The aggregate operation groups input data on one or more sets of grouping keys, calculating each aggregation function for each combination of grouping key. +Colocated aggregate assumes that the data is already distributed according to grouping keys, therefore aggregation can be completed locally in a single pass. +The hash aggregate operation maintains a hash table for each grouping set to coalesce equivalent tuples. +The output rows are composed as follow: first come columns participated in grouping keys in the order they enumerated in `group` attribute, then come results of accumulators in the order they enumerated in `aggregation` attribute. + +Attributes: + +- `group`: Set of grouping columns. +- `groupSets`: List of group key definitions for advanced grouping, like CUBE or ROLLUP. Review Comment: We should put all optional attributes after mandatory attributes to make this easier for users. ########## docs/_docs/sql-reference/explain-statement.adoc: ########## @@ -0,0 +1,294 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. += EXPLAIN Command + +The `EXPLAIN` command is used to display the execution plan of a SQL query, showing how the query will be processed by the sql engine. Review Comment: ```suggestion The `EXPLAIN` command is used to display the execution plan of an SQL query, showing how the query will be processed by the sql engine. ``` ########## docs/_docs/sql-reference/explain-statement.adoc: ########## @@ -0,0 +1,294 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. += EXPLAIN Command + +The `EXPLAIN` command is used to display the execution plan of a SQL query, showing how the query will be processed by the sql engine. +It provides insights into the relational operators used, their configuration, and the estimated number of rows processed at each step. +This information is essential for diagnosing performance bottlenecks and understanding query optimization decisions. + +== Syntax + +[.diagram-container] +Diagram( + Terminal('EXPLAIN'), + Optional( + Sequence( + Choice( + 0, + Terminal('PLAN'), + Terminal('MAPPING') + ), + Terminal('FOR') + ) + ), + NonTerminal('query_or_dml') +) + +If neither `PLAN` nor `MAPPING` is specified, then `PLAN` is implicit. + +Parameters: + +- `PLAN` - explains query in terms of relational operators tree. +This representation is suitable for investigation of performance issues related to the optimizer. + +- `MAPPING` - explains query in terms of mapping of query fragment to a particular node of the cluster. +This representation is suitable for investigation of performance issues related to the data colocation. + +Examples: + +[source,sql] +---- +EXPLAIN SELECT * FROM lineitem; +EXPLAIN PLAN FOR SELECT * FROM lineitem; +EXPLAIN MAPPING FOR SELECT * FROM lineitem; +---- + +== Understanding The Output + +Each query plan is represented as a tree-like structure composed of link:sql-reference/explain-operators-list[**relational operators**]. + +A node in the plan includes: + +- A **name**, indicating the relational operator (e.g., `TableScan`, `IndexScan`, `Sort`, `Join` types) +- A set of **attributes**, relevant to that specific operator + +[source,text] +---- +OperatorName + attribute1: value1 + attribute2: value2 +---- + +Examples: + +[source,text] +---- +TableScan // Full table access + table: PUBLIC.EMP + fieldNames: [NAME, SALARY] + est: (rows=1) + +IndexScan // Index-based access + table: PUBLIC.EMP + index: EMP_NAME_DESC_IDX + type: SORTED + fields: [NAME] + collation: [NAME DESC] + est: (rows=1) + +Sort // Sort rows + collation: [NAME DESC NULLS LAST] + est: (rows=1) +---- + +=== Operator Naming + +The operator name reflects the specific algorithm or strategy used. +For example: + +- `TableScan` – Full scan of a base table. +- `IndexScan` – Access via index, possibly sorted. +- `Sort` – Explicit sorting step. +- `HashJoin`, `MergeJoin`, `NestedLoopJoin` – Types of join algorithms. +- `Limit`, `Project`, `Exchange` – Execution-related transformations and controls. + +=== Hierarchical Plan Structure + +The plan is structured as a **tree**, where: + +- **Leaf nodes** represent data sources (e.g., `TableScan`) +- **Internal nodes** represent data transformations (e.g., `Join`, `Sort`) +- **The root node** (topmost) is the final operator that produces the result + +=== Example: Complex Join + +[source,sql] +---- +EXPLAIN SELECT + U.UserName, P.ProductName, R.ReviewText, R.Rating + FROM Users U, Reviews R, Products P + WHERE U.UserID = R.UserID + AND R.ProductID = P.ProductID + AND P.ProductName = 'Product_' || ?::varchar +---- + +The resulting output is: + +[example] +---- +Project + fieldNames: [USERNAME, PRODUCTNAME, REVIEWTEXT, RATING] + projection: [USERNAME, PRODUCTNAME, REVIEWTEXT, RATING] + est: (rows=16650) + HashJoin + predicate: =(USERID$0, USERID) + fieldNames: [PRODUCTID, USERID, REVIEWTEXT, RATING, PRODUCTID$0, PRODUCTNAME, USERID$0, USERNAME] + type: inner + est: (rows=16650) + HashJoin + predicate: =(PRODUCTID, PRODUCTID$0) + fieldNames: [PRODUCTID, USERID, REVIEWTEXT, RATING, PRODUCTID$0, PRODUCTNAME] + type: inner + est: (rows=16650) + Exchange + distribution: single + est: (rows=50000) + TableScan + table: PUBLIC.REVIEWS + fieldNames: [PRODUCTID, USERID, REVIEWTEXT, RATING] + est: (rows=50000) + Exchange + distribution: single + est: (rows=1665) + TableScan + table: PUBLIC.PRODUCTS + predicate: =(PRODUCTNAME, ||(_UTF-8'Product_', CAST(?0):VARCHAR CHARACTER SET "UTF-8")) + fieldNames: [PRODUCTID, PRODUCTNAME] + est: (rows=1665) + Exchange + distribution: single + est: (rows=10000) + TableScan + table: PUBLIC.USERS + fieldNames: [USERID, USERNAME] + est: (rows=10000) +---- + +This execution plan represents a query that joins three tables: `USERS`, `REVIEWS`, and `PRODUCTS`, and selects four fields after filtering by product name. + +* **Project** (root node): +Outputs the final selected fields — `USERNAME`, `PRODUCTNAME`, `REVIEWTEXT`, and `RATING`. + +* **HashJoins** (two levels): +Perform the inner joins. +** The first (bottom-most) joins `REVIEWS` with `PRODUCTS` on `PRODUCTID`. +** The second joins the result with `USERS` on `USERID`. + +* **TableScans**: +Each table is scanned: +** `REVIEWS` is fully scanned. +** `PRODUCTS` is scanned with a filter on `PRODUCTNAME`. +** `USERS` is fully scanned. + +* **Exchange** nodes: +Indicate data redistribution between operators. + +Each node includes: + +- `fieldNames`: Output columns at that stage. +- `predicate`: Join or filter condition. +- `est`: Estimated number of rows at that point in the plan. + +=== Query Mapping + +A result of EXPLAIN MAPPING command includes additional metadata providing insight at how the query is mapped on cluster topology: Review Comment: We should add a command that returns this output. ########## docs/_docs/sql-reference/explain-statement.adoc: ########## @@ -0,0 +1,294 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. += EXPLAIN Command + +The `EXPLAIN` command is used to display the execution plan of a SQL query, showing how the query will be processed by the sql engine. +It provides insights into the relational operators used, their configuration, and the estimated number of rows processed at each step. +This information is essential for diagnosing performance bottlenecks and understanding query optimization decisions. + +== Syntax + +[.diagram-container] +Diagram( + Terminal('EXPLAIN'), + Optional( + Sequence( + Choice( + 0, + Terminal('PLAN'), + Terminal('MAPPING') + ), + Terminal('FOR') + ) + ), + NonTerminal('query_or_dml') +) + +If neither `PLAN` nor `MAPPING` is specified, then `PLAN` is implicit. + +Parameters: + +- `PLAN` - explains query in terms of relational operators tree. +This representation is suitable for investigation of performance issues related to the optimizer. + +- `MAPPING` - explains query in terms of mapping of query fragment to a particular node of the cluster. +This representation is suitable for investigation of performance issues related to the data colocation. + +Examples: + +[source,sql] +---- +EXPLAIN SELECT * FROM lineitem; +EXPLAIN PLAN FOR SELECT * FROM lineitem; +EXPLAIN MAPPING FOR SELECT * FROM lineitem; +---- + +== Understanding The Output + +Each query plan is represented as a tree-like structure composed of link:sql-reference/explain-operators-list[**relational operators**]. + +A node in the plan includes: + +- A **name**, indicating the relational operator (e.g., `TableScan`, `IndexScan`, `Sort`, `Join` types) +- A set of **attributes**, relevant to that specific operator + +[source,text] +---- +OperatorName + attribute1: value1 + attribute2: value2 +---- + +Examples: + +[source,text] +---- +TableScan // Full table access + table: PUBLIC.EMP + fieldNames: [NAME, SALARY] + est: (rows=1) + +IndexScan // Index-based access + table: PUBLIC.EMP + index: EMP_NAME_DESC_IDX + type: SORTED + fields: [NAME] + collation: [NAME DESC] + est: (rows=1) + +Sort // Sort rows + collation: [NAME DESC NULLS LAST] + est: (rows=1) +---- + +=== Operator Naming + +The operator name reflects the specific algorithm or strategy used. +For example: + +- `TableScan` – Full scan of a base table. +- `IndexScan` – Access via index, possibly sorted. +- `Sort` – Explicit sorting step. +- `HashJoin`, `MergeJoin`, `NestedLoopJoin` – Types of join algorithms. +- `Limit`, `Project`, `Exchange` – Execution-related transformations and controls. + +=== Hierarchical Plan Structure + +The plan is structured as a **tree**, where: + +- **Leaf nodes** represent data sources (e.g., `TableScan`) +- **Internal nodes** represent data transformations (e.g., `Join`, `Sort`) +- **The root node** (topmost) is the final operator that produces the result + +=== Example: Complex Join Review Comment: Lets add a level 2 header Examples, i.e. == Examples === Example: Complex Join ... === Example: Query Mapping ########## docs/_docs/sql-reference/explain-statement.adoc: ########## @@ -0,0 +1,294 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. += EXPLAIN Command + +The `EXPLAIN` command is used to display the execution plan of a SQL query, showing how the query will be processed by the sql engine. +It provides insights into the relational operators used, their configuration, and the estimated number of rows processed at each step. +This information is essential for diagnosing performance bottlenecks and understanding query optimization decisions. + +== Syntax + +[.diagram-container] +Diagram( + Terminal('EXPLAIN'), + Optional( + Sequence( + Choice( + 0, + Terminal('PLAN'), + Terminal('MAPPING') + ), + Terminal('FOR') + ) + ), + NonTerminal('query_or_dml') +) + +If neither `PLAN` nor `MAPPING` is specified, then `PLAN` is implicit. + +Parameters: + +- `PLAN` - explains query in terms of relational operators tree. +This representation is suitable for investigation of performance issues related to the optimizer. + +- `MAPPING` - explains query in terms of mapping of query fragment to a particular node of the cluster. +This representation is suitable for investigation of performance issues related to the data colocation. + +Examples: + +[source,sql] +---- +EXPLAIN SELECT * FROM lineitem; +EXPLAIN PLAN FOR SELECT * FROM lineitem; +EXPLAIN MAPPING FOR SELECT * FROM lineitem; +---- + +== Understanding The Output + +Each query plan is represented as a tree-like structure composed of link:sql-reference/explain-operators-list[**relational operators**]. + +A node in the plan includes: + +- A **name**, indicating the relational operator (e.g., `TableScan`, `IndexScan`, `Sort`, `Join` types) +- A set of **attributes**, relevant to that specific operator + +[source,text] +---- +OperatorName + attribute1: value1 + attribute2: value2 +---- + +Examples: + +[source,text] +---- +TableScan // Full table access + table: PUBLIC.EMP + fieldNames: [NAME, SALARY] + est: (rows=1) + +IndexScan // Index-based access + table: PUBLIC.EMP + index: EMP_NAME_DESC_IDX + type: SORTED + fields: [NAME] + collation: [NAME DESC] + est: (rows=1) + +Sort // Sort rows + collation: [NAME DESC NULLS LAST] + est: (rows=1) +---- + +=== Operator Naming + +The operator name reflects the specific algorithm or strategy used. +For example: + +- `TableScan` – Full scan of a base table. +- `IndexScan` – Access via index, possibly sorted. +- `Sort` – Explicit sorting step. +- `HashJoin`, `MergeJoin`, `NestedLoopJoin` – Types of join algorithms. +- `Limit`, `Project`, `Exchange` – Execution-related transformations and controls. + +=== Hierarchical Plan Structure + +The plan is structured as a **tree**, where: + +- **Leaf nodes** represent data sources (e.g., `TableScan`) +- **Internal nodes** represent data transformations (e.g., `Join`, `Sort`) +- **The root node** (topmost) is the final operator that produces the result + +=== Example: Complex Join + +[source,sql] +---- +EXPLAIN SELECT Review Comment: ```suggestion EXPLAIN PLAN FOR SELECT ``` ########## docs/_docs/sql-reference/explain-statement.adoc: ########## @@ -0,0 +1,294 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. += EXPLAIN Command + +The `EXPLAIN` command is used to display the execution plan of a SQL query, showing how the query will be processed by the sql engine. +It provides insights into the relational operators used, their configuration, and the estimated number of rows processed at each step. +This information is essential for diagnosing performance bottlenecks and understanding query optimization decisions. + +== Syntax + +[.diagram-container] +Diagram( + Terminal('EXPLAIN'), + Optional( + Sequence( + Choice( + 0, + Terminal('PLAN'), + Terminal('MAPPING') + ), + Terminal('FOR') + ) + ), + NonTerminal('query_or_dml') +) + +If neither `PLAN` nor `MAPPING` is specified, then `PLAN` is implicit. + +Parameters: + +- `PLAN` - explains query in terms of relational operators tree. +This representation is suitable for investigation of performance issues related to the optimizer. + +- `MAPPING` - explains query in terms of mapping of query fragment to a particular node of the cluster. +This representation is suitable for investigation of performance issues related to the data colocation. + +Examples: + +[source,sql] +---- +EXPLAIN SELECT * FROM lineitem; +EXPLAIN PLAN FOR SELECT * FROM lineitem; +EXPLAIN MAPPING FOR SELECT * FROM lineitem; +---- + +== Understanding The Output + +Each query plan is represented as a tree-like structure composed of link:sql-reference/explain-operators-list[**relational operators**]. + +A node in the plan includes: + +- A **name**, indicating the relational operator (e.g., `TableScan`, `IndexScan`, `Sort`, `Join` types) +- A set of **attributes**, relevant to that specific operator + +[source,text] +---- +OperatorName + attribute1: value1 + attribute2: value2 +---- + +Examples: + +[source,text] +---- +TableScan // Full table access + table: PUBLIC.EMP + fieldNames: [NAME, SALARY] + est: (rows=1) + +IndexScan // Index-based access + table: PUBLIC.EMP + index: EMP_NAME_DESC_IDX + type: SORTED + fields: [NAME] + collation: [NAME DESC] + est: (rows=1) + +Sort // Sort rows + collation: [NAME DESC NULLS LAST] + est: (rows=1) +---- + +=== Operator Naming + +The operator name reflects the specific algorithm or strategy used. +For example: + +- `TableScan` – Full scan of a base table. +- `IndexScan` – Access via index, possibly sorted. +- `Sort` – Explicit sorting step. +- `HashJoin`, `MergeJoin`, `NestedLoopJoin` – Types of join algorithms. +- `Limit`, `Project`, `Exchange` – Execution-related transformations and controls. + +=== Hierarchical Plan Structure + +The plan is structured as a **tree**, where: + +- **Leaf nodes** represent data sources (e.g., `TableScan`) +- **Internal nodes** represent data transformations (e.g., `Join`, `Sort`) +- **The root node** (topmost) is the final operator that produces the result + +=== Example: Complex Join + +[source,sql] +---- +EXPLAIN SELECT + U.UserName, P.ProductName, R.ReviewText, R.Rating + FROM Users U, Reviews R, Products P + WHERE U.UserID = R.UserID + AND R.ProductID = P.ProductID + AND P.ProductName = 'Product_' || ?::varchar +---- + +The resulting output is: + +[example] +---- +Project + fieldNames: [USERNAME, PRODUCTNAME, REVIEWTEXT, RATING] + projection: [USERNAME, PRODUCTNAME, REVIEWTEXT, RATING] + est: (rows=16650) + HashJoin + predicate: =(USERID$0, USERID) + fieldNames: [PRODUCTID, USERID, REVIEWTEXT, RATING, PRODUCTID$0, PRODUCTNAME, USERID$0, USERNAME] + type: inner + est: (rows=16650) + HashJoin + predicate: =(PRODUCTID, PRODUCTID$0) + fieldNames: [PRODUCTID, USERID, REVIEWTEXT, RATING, PRODUCTID$0, PRODUCTNAME] + type: inner + est: (rows=16650) + Exchange + distribution: single + est: (rows=50000) + TableScan + table: PUBLIC.REVIEWS + fieldNames: [PRODUCTID, USERID, REVIEWTEXT, RATING] + est: (rows=50000) + Exchange + distribution: single + est: (rows=1665) + TableScan + table: PUBLIC.PRODUCTS + predicate: =(PRODUCTNAME, ||(_UTF-8'Product_', CAST(?0):VARCHAR CHARACTER SET "UTF-8")) + fieldNames: [PRODUCTID, PRODUCTNAME] + est: (rows=1665) + Exchange + distribution: single + est: (rows=10000) + TableScan + table: PUBLIC.USERS + fieldNames: [USERID, USERNAME] + est: (rows=10000) +---- + +This execution plan represents a query that joins three tables: `USERS`, `REVIEWS`, and `PRODUCTS`, and selects four fields after filtering by product name. + +* **Project** (root node): +Outputs the final selected fields — `USERNAME`, `PRODUCTNAME`, `REVIEWTEXT`, and `RATING`. + +* **HashJoins** (two levels): +Perform the inner joins. +** The first (bottom-most) joins `REVIEWS` with `PRODUCTS` on `PRODUCTID`. +** The second joins the result with `USERS` on `USERID`. + +* **TableScans**: +Each table is scanned: +** `REVIEWS` is fully scanned. +** `PRODUCTS` is scanned with a filter on `PRODUCTNAME`. +** `USERS` is fully scanned. + +* **Exchange** nodes: +Indicate data redistribution between operators. + +Each node includes: + +- `fieldNames`: Output columns at that stage. +- `predicate`: Join or filter condition. +- `est`: Estimated number of rows at that point in the plan. + +=== Query Mapping Review Comment: ```suggestion === Example: Query Mapping ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@ignite.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org