Konstantin Orlov created IGNITE-25366:
-----------------------------------------

             Summary: Documentation. Describe output of EXPLAIN command
                 Key: IGNITE-25366
                 URL: https://issues.apache.org/jira/browse/IGNITE-25366
             Project: Ignite
          Issue Type: Improvement
          Components: documentation ai3
            Reporter: Konstantin Orlov


Let's update documentation with page describing general concept of EXPLAIN 
output, which is provided below.

Every relational operator is described with a name and set of attributes:
{code}
NameOfTheRelationalOperator
    attribute1: value1
    attribute2: value2

{code}

Let’s take a look at few examples:
{code}
TableScan                        // Full table access
    table: PUBLIC.EMP1           // Name of the table in question
    fields: [NAME, SALARY]       // List of columns to return
    est: (rows=1)                // Estimated number of rows returned

IndexScan                        // Index scan
    table: PUBLIC.TEST_TBL       // Name of the table in question
    index: IDX_DESC              // Name of the index in question
    type: SORTED                 // Type of the index
    fields: [C1]                 // List of columns to return
    collation: [C1 DESC]         // Collation of the index aka order of sorting
    est: (rows=1)                // Estimated number of rows returned

Sort
    collation: [C1 DESC NULLS LAST] // Collation to sort input rows
    est: (rows=1)                   // Estimated number of rows returned
{code}

Name represents a particular algorithm used to execute relation operators 
({{TableScan}} vs {{IndexScan}}, {{HashJoin}} vs {{MergeJoin}} vs 
{{NestedLoopJoin}}, etc). The set of attributes depends on the particular 
relational operator.

A query plan is represented by a tree-like structure which is the composition 
of nodes described above. This tree describes a data flow, where rows are 
passed from leaves to a plan root node (root node is the topmost node, it also 
has no indentation). Let's take a look at few examples:
{code}
// simple ordered select where desired order matches collation of existing index
EXPLAIN PLAN FOR SELECT c1 FROM test_tbl ORDER BY c1 DESC NULLS FIRST

Exchange
    distribution: single
    est: (rows=1)
  IndexScan
      table: PUBLIC.TEST_TBL
      index: IDX_DESC
      type: SORTED
      fields: [C1]
      collation: [C1 DESC]
      est: (rows=1)

// similar query, but desired order doesn't match index collation. Mind the 
additional
// SORT node.
EXPLAIN PLAN FOR SELECT c1 FROM test_tbl ORDER BY c1 DESC NULLS LAST

Exchange
    distribution: single
    est: (rows=1)
  Sort
      collation: [C1 DESC NULLS LAST]
      est: (rows=1)
    TableScan
        table: PUBLIC.TEST_TBL
        fields: [C1]
        est: (rows=1)

// Mind the "fetch" attribute of Sort node. It denotes TopN sort algorithm,
// implying that only N nodes will be kept in memory. This also implies
// that only N rows from every node will be transferred over Exchange.  
SELECT * FROM test ORDER BY pk FETCH FIRST ? ROWS ONLY

Limit
    fetch: ?0
    est: (rows=1)
  Exchange
      distribution: single
      est: (rows=1)
    Sort
        collation: [PK ASC]
        fetch: ?0
        est: (rows=1)
      TableScan
          table: PUBLIC.TEST
          fields: [PK, COL0]
          est: (rows=1)


// Similar query, but Limit node wasn't pushed down the exchange. This implies
// that the whole dataset will be transferred over an Exchange.
SELECT * FROM test OFFSET ? ROWS FETCH FIRST ? ROWS ONLY 

Limit
    offset: ?0
    fetch: ?1
    est: (rows=1)
  Exchange
      distribution: single
      est: (rows=1)
    TableScan
        table: PUBLIC.TEST
        fields: [PK, COL0]
        est: (rows=1)

// More complex plan
EXPLAIN PLAN FOR SELECT 
    U.UserName, P.ProductName, R.ReviewText, R.Rating
 FROM Users U, Reviews R, Products P
WHERE U.UserID = R.UserID
  AND R.ProductID = P.ProductID
  AND P.ProductName = 'Product_' || ?::varchar;

Project
    fields: [USERNAME, PRODUCTNAME, REVIEWTEXT, RATING]
    exprs: [USERNAME, PRODUCTNAME, REVIEWTEXT, RATING]
    est: (rows=16650)
  HashJoin
      condition: =(USERID0, USERID)
      joinType: inner
      est: (rows=16650)
    HashJoin
        condition: =(PRODUCTID, PRODUCTID0)
        joinType: inner
        est: (rows=16650)
      Exchange
          distribution: single
          est: (rows=50000)
        TableScan
            table: PUBLIC.REVIEWS
            fields: [PRODUCTID, USERID, REVIEWTEXT, RATING]
            est: (rows=50000)
      Exchange
          distribution: single
          est: (rows=1)665
        TableScan
            table: PUBLIC.PRODUCTS
            filters: =(PRODUCTNAME, ||(_UTF-8'Product_', CAST(?0):VARCHAR 
CHARACTER SET "UTF-8"))
            fields: [PRODUCTID, PRODUCTNAME]
            est: (rows=1665)
    Exchange
        distribution: single
        est: (rows=10000)
      TableScan
          table: PUBLIC.USERS
          fields: [USERID, USERNAME]
          est: (rows=10000)
{code}

We also should provide exhaustive list of relational operators their attributes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to