[GitHub] [incubator-doris] EmmyMiao87 commented on a change in pull request #3808: [Doc] Add docs to OLAP_SCAN_NODE query profile

GitBox Fri, 12 Jun 2020 03:08:10 -0700


EmmyMiao87 commented on a change in pull request #3808:
URL: https://github.com/apache/incubator-doris/pull/3808#discussion_r439329406




##########
File path: docs/zh-CN/administrator-guide/running-profile.md
##########
@@ -130,22 +133,105 @@ BE端收集的统计信息较多，下面列出了各个参数的对应含义：
   - HashProbe:  HashTable查询的次数
   - HashTravelLength:  HashTable查询时移动的步数
 
-#### OLAP_SCAN_NODE:
- - RowsProduced: 生成结果的行数
- - BytesRead: scan node 扫描数据的总量
- - TotalReadThroughput：吞吐量
- - TabletCount: tablet 的数目
- - RowsPushedCondFiltered：下推的过滤器
- - RawRowsRead: 实际读取的行数
- - RowsReturned: 该节点返回的行数
- - RowsReturnedRate: 返回行数的速率
- - PeakMemoryUsage 内存使用的峰值  
-
-#### Buffer pool:
+#### `OLAP_SCAN_NODE`
+
+`OLAP_SCAN_NODE` 节点负责具体的数据扫描任务。一个 `OLAP_SCAN_NODE` 会生成一个或多个 `OlapScanner` 
线程。每个 Scanner 线程负责扫描部分数据。
+
+查询中的部分或全部谓词条件会推送给 
`OLAP_SCAN_NODE`。这些谓词条件中一部分会继续下推给存储引擎，以便利用存储引擎的索引进行数据过滤。另一部分会保留在 
`OLAP_SCAN_NODE` 中，用于过滤从存储引擎中返回的数据。
+
+一个典型的 `OLAP_SCAN_NODE` 节点的 Profile 如下。部分指标会因存储格式的不同（V1 或 V2）而有不同含义。
+
+```
+OLAP_SCAN_NODE (id=0):(Active: 4.050ms, non-child: 35.68%)
+   - BitmapIndexFilterCount: 0  # 利用 bitmap 索引过滤掉的行数。
+   - BitmapIndexFilterTimer: 0.000ns    # 利用 bitmap 索引过滤数据的耗时。
+   - BlockConvertTime: 7.433ms  # 将向量化Block转换为行结构的 RowBlock 的耗时。向量化 Block 在 V1 
中为 VectorizedRowBatch，V2中为 RowBlockV2。
+   - BlockFetchTime: 36.934ms   # Rowset Reader 获取 Block 的时间。
+   - BlockLoadTime: 23.368ms    # SegmentReader(V1) 或 SegmentIterator(V2) 获取 
block 的时间。
+   - BlockSeekCount: 0  # 读取 Segment 时进行 block seek 的次数。
+   - BlockSeekTime: 3.062ms # 读取 Segment 时进行 block seek 的耗时。
+   - BlocksLoad: 221    # 读取 Block 的数量
+   - BytesRead: 6.59 MB # 从数据文件中读取到的数据量。假设读取到了是10个32位整型，则数据量为 10 * 4B = 40 
Bytes。这个数据仅表示数据在内存中全展开的大小，并不代表实际的 IO 大小。
+   - CachedPagesNum: 0  # 仅 V2 中，当开启 PageCache 后，命中 Cache 的 Page 数量。
+   - CompressedBytesRead: 1.36 MB   # V1 中，从文件中读取的解压前的数据大小。V2 中，读取到的没有命中 
PageCache 的 Page 的压缩前的大小。
+   - DecompressorTimer: 4.194ms # 数据解压耗时。
+   - IOTimer: 1.404ms   # 实际从操作系统读取数据的 IO 时间。
+   - IndexLoadTime: 1.521ms # 仅 V1 中，读取 Index Stream 的耗时。
+   - NumDiskAccess: 6   # 该 ScanNode 节点涉及到的磁盘数量。
+   - NumScanners: 25    # 该 ScanNode 生成的 Scanner 数量。
+   - PeakMemoryUsage: 0     # 无意义
+   - PerReadThreadRawHdfsThroughput: 0.00 /sec  # 无意义
+   - RawRowsRead: 141.71K   # 存储引擎中读取的原始行数。详情见下文。
+   - ReaderInitTime: 16.515ms   # OlapScanner 初始化 Reader 的时间。V1 中包括组建 
MergeHeap 的时间。V2 中包括生成各级 Iterator 并读取第一组Block的时间。
+   - RowsBloomFilterFiltered: 0 # 仅 V2 中，通过 BloomFilter 索引过滤掉的行数。
+   - RowsDelFiltered: 0     # V1 中表示根据 delete 条件过滤掉的行数。V2 中还包括通过 BloomFilter 
和部分谓词条件过滤掉的行数。
+   - RowsPushedCondFiltered: 0  # 根据传递下推的谓词过滤掉的条件，比如 Join 计算中从 BuildTable 传递给 
ProbeTable 的条件。该数值不准确，因为如果过滤效果差，就不再过滤了。
+   - RowsRead: 132.78K  # 从存储引擎返回到 Scanner 的行数，不包括经 Scanner 过滤的行数。
+   - RowsReturned: 132.78K  # 从 ScanNode 返回给上层节点的行数。
+   - RowsReturnedRate: 32.78 M/sec  # RowsReturned/ActiveTime
+   - RowsStatsFiltered: 0   # V2 中，包含谓词条件根据 Zonemap 过滤掉的行数。V1 中还包含通过 
BloomFilter 过滤掉的行数。
+   - RowsVectorPredFiltered: 0  # 通过向量化条件过滤操作过滤掉的行数。
+   - ScanTime: 49.239ms：Scanner 调用 get_next() 方法的耗时统计。
+   - ScannerThreadsInvoluntaryContextSwitches: 0    # 无意义
+   - ScannerThreadsTotalWallClockTime: 0.000ns  # 无意义
+     - MaterializeTupleTime(*): 0.000ns # 无意义
+     - ScannerThreadsSysTime: 0.000ns   # 无意义
+     - ScannerThreadsUserTime: 0.000ns  # 无意义
+   - ScannerThreadsVoluntaryContextSwitches: 0  # 无意义
+   - ShowHintsTime: 0.000ns # V2 中无意义。V1 中读取部分数据来进行 ScanRange 的切分。
+   - TabletCount : 25   # 该 ScanNode 涉及的 Tablet 数量。
+   - TotalPagesNum: 0   # 仅 V2 中，读取的总 Page 数量。
+   - TotalRawReadTime(*): 0.000ns   # 无意义
+   - TotalReadThroughput: 0.00 /sec # 无意义
+   - UncompressedBytesRead: 4.28 MB # V1 
中为读取的数据文件解压后的大小（如果文件无需解压，则直接统计文件大小）。V2 中，仅统计未命中 PageCache 的 Page 
解压后的大小（如果Page无需解压，直接统计Page大小）
+   - VectorPredEvalTime: 0.000ns    # 向量化条件过滤操作的耗时。
+```
+
+* V2 格式中关于行数的一些说明
+
+    在 Profile 中和行数相关的指标有：
+    
+    * RowsKeyRangeFiltered
+    * RowsBitmapIndexFiltered
+    * RowsBloomFilterFiltered
+    * RowsStatsFiltered
+    * RowsDelFiltered
+    * RawRowsRead
+    * RowsRead
+    * RowsReturned
+
+    当读取一个 V2 格式的 Segment 时，首先会根据 Key range（前缀key组成的查询范围）进行一次过滤，过滤掉的行数记录在 
`RowsKeyRangeFiltered` 中。之后，再利用 Bitmap 索引过滤数据，过滤掉的行数记录在 
`RowsBitmapIndexFiltered` 中。之后，再利用 BloomFilter 索引过滤数据，记录在 
`RowsBloomFilterFiltered` 中。`RowsBloomFilterFiltered` 的值是 Segment 
的总行数（而不是Bitmap索引过滤后的行数）和经过 BloomFilter 过滤后剩余行数的差值，因此 BloomFilter 过滤掉的数据可能会和 
Bitmap 过滤掉的数据有重叠。
+
+    `RowsStatsFiltered` 中记录的是经过其他谓词条件过滤掉的行数，这里包括下推到存储引擎的谓词条件，以及存储引擎中的 Delete 
条件。
+    
+    `RowsDelFiltered` 中包含了 `RowsBloomFilterFiltered` 和 `RowsStatsFiltered` 
记录的过滤行数。
+    
+    `RawRowsRead` 是经过上述过滤后，最终需要读取的行数。而 `RowsRead` 是最终返回给 Scanner 
的行数。`RowsRead` 通常小于 `RawRowsRead`，是因为从存储引擎返回到 Scanner，可能会经过一次数据聚合。
+    
+    `RowsReturned` 是 ScanNode 最终返回给上层节点的行数。`RowsReturned` 通常也会小于 
+`RowsRead`。因为在 Scanner 上会有一些没有下推给存储引擎的谓词条件，会进行一次过滤。
+
+    通过以上指标，可以大致分析出存储引擎处理的行数以及最终过滤后的结果行数大小。通过 `Rows***Filtered` 
这组指标，也可以分析查询条件是否下推到了存储引擎，以及不同索引的过滤效果。
+    
+    如果 `RawRowsRead` 和 `RowsRead` 差距较大，则说明大量的行被聚合，而聚合可能比较耗时。如果 `RowsRead` 和 
`RowsReturned` 差距较大，则说明很多行在 Scanner 中进行了过滤。这说明很多选择度高的谓词条件并没有推送给存储引擎。而在 Scanner 
中的过滤效率会比在存储引擎中过滤效率差。

Review comment:
       好吧




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [incubator-doris] EmmyMiao87 commented on a change in pull request #3808: [Doc] Add docs to OLAP_SCAN_NODE query profile

Reply via email to